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CHAPTER  1 


INTRODUCTION 

1.1  Introduction  and  Background 

The  role  of  Information  in  the  modern  society  is  of  the  utmost 
importance.  The  existence  of  large  social  and  economic  organ tzat ions 
which  characterize  today's  advanced  societies  depends  on  the  capabili¬ 
ties  for  producing,  processing,  updating,  and  communicating  information 
on  a  very  large  scale.  It  is  well  established  that  in  an  advanced  society 
a  great  portion  of  the  labor  force  is  involved  in  activities  related  to 
information,  and  a  high  percentage  of  the  cost  of  running  the  economy 
consists  of  information  related  costs  [13].*  Yet  the  quantitative  analysis 
to  rationalize  the  allocation  of  resources  to  the  various  activities  related 
to  the  information  process  has  been  very  limited. 

The  most  important  problem  in  dealing  with  information  is, perhaps, 
the  problem  of  measurement.  We  often  have  an  intuitive  tendency  to  regard 
information  as  a  commodity  and  to  apply  the  notions  of  "more"  and  "less" 
to  it  just  as  we  apply  them  to  commodities.  Upon  more  careful  examination, 
however,  we  find  that  there  is  no  clear  way  we  can  attribute  such  notions 
to  Information  in  a  general  and  precise  fashion  [14].  Therefore,  we  need 
an  appropriate  context  in  which  information  can  be  treated  in  a  meaningful, 
proper,  and  precise  manner.  Decision  analysis  provides  such  a  context. 
Decision  analysis  treats  information  explicitly  and  evaluates  it  on  the 
basis  of  its  onomlc  value  for  making  decisions.  Various  pieces  of  in¬ 
formation  are  compared  according  to  their  economic  values  for  making  a 
specific  decision.  It  is  only  natural,  therefore,  to  find  that  one 

*Numbers  in  square  brackets  refer  to  the  References  found  at  the  end 
of  the  thesis. 


piece  of  Information  is  more  valuable  than  another  for  one  decision, 
while  the  opposite  is  true  for  a  different  decision.  This  would  be 
inconsistent  if  measures  of  information  (e.g.  "quantity  of  information") 
are  defined  in  the  abstract,  separate  from  the  decision  itself. 

Previous  studies  of  information  in  the  context  of  decision  analysis 
have  been  mostly  static,  in  the  sense  that  they  make  no  reference  to 
the  time  of  the  decision  or  the  age  of  the  information.  In  many  real- 
world  cases,  however,  information  has  some  kind  of  time  association. 
Consider,  for  example,  a  large  organization  in  which  information  related 
to  the  activities  of  the  organization  is  accumulated  and  updated  for 
making  future  decisions.  Here  we  have  a  continuing  process  of  information 
production  and  updating  for  the  general  task  of  decision  making  in  the 
organization.  In  such  a  scenario,  information  is  dynamic  and  must  be 
studied  in  a  dynamic  framework. 

An  Important  case  in  which  a  dynamic  framework  is  essential  for 
the  study  c'  information  is  when  the  time  of  the  decision  is  uncertain. 

The  decision  must  be  made  upon  the  occurrence  of  a  precipitating  event 
which  occurs  randomly  with  time.  The  decision  maker  often  has  no  control 
over  this  event.  This  type  of  decision  is  referred  to  as  a  contingent 
decision  [5  ].  We  can  find  numerous  examples  of  contingent  decisions: 
firms  have  to  make  contingent  decisions  In  response  to  action  by  their 
competitors  or  by  the  government  (at  uncertain  times);  the  government 
often  faces  contingent  decisions  as  a  result  of  other  countries'  economic 
or  political  decisions. 

Due  to  the  urgency  often  associated  with  this  type  of  decision, 
there  is  typically  insufficient  time  to  obtain  adequate  information  for 


the  decision  after  the  precipitating  event  has  occurred.  It  is  therefore 
desirable  to  be  prepared  for  the  decision  by  obtaining  the  necessary 
information  in  advance.  A  problem  arises,  however,  because  we  are 
generally  faced  with  a  changing  environment  in  which  the  acquired  informa¬ 
tion  becomes  outdated  and  obsolete  as  time  passes.  Suppose,  for  example, 
that  we  are  expecting  a  contingent  decision  and  that  the  level  of  an 
inventory  (at  the  decision  time)  is  a  valuable  piece  of  information  for 
making  the  decision.  If  we  learn  the  level  of  the  inventory  today,  this 
information  would  be  very  valuable  if  the  decision  happens  today,  but  it 
may  not  be  as  valuable  if  the  decision  happens  a  week  from  now.  We  may 
expect  that  this  information  would  be  less  and  less  valuable  as  time 
passes.  This  is  called  information  outdating  or  perishing.  Faced  with 
this  problem,  we  need  to  update  our  information  regularly  in  order  to 
remain  prepared  for  the  decision.  This  is  referred  to  as  information 
recovery  or  replenishment.  In  the  above  example  regular  observations  of 
the  level  of  the  Inventory  would  enhance  our  information  at  the  time  of 
the  decision.  The  more  frequently  the  observations  are  made,  the  more 
accurate  our  information  will  be  at  the  decision  time.  But  the  informa¬ 
tion  is  costly  and  therefore  we  have  to  balance  the  costs  and  the  bene¬ 
fits  of  information  in  order  to  find  the  optimum  policies  for  the 
recovery  of  information. 

It  is  clear  from  the  above  discussion  that  the  Important  problems 
with  regard  to  information  in  a  dynamic  framework  are  the  outdating  and 
the  recovery  of  information.  These  problems  are  analyzed  in  this  study. 

There  are  other  decisions  which  are  in  many  respects  similar  to 
contingent  decisions.  As  in  the  case  of  a  contingent  decision,  there 


is  uncertainty  about  the  time  of  the  decision  and  therefore  information 
must  be  obtained  in  advance.  But  there  is  also  uncertainty  in  the  exact 
nature  of  the  decision.  Consider,  for  example,  all  the  data  gathered 
and  updated  by  the  government  or  private  agencies  in  anticipation  of 
future  use.  There  is  uncertainty  regarding  the  time  of  use  as  well  as 
the  purpose  (decision)  for  which  the  data  will  be  used  (although  the 
general  area  of  use  may  be  known) .  We  will  confine  our  study  to  contin¬ 
gent  decisions,  namely  the  case  in  which  there  is  no  ambiguity  in  the 
decision  itself.  The  results  may  be  useful,  however,  for  the  study  of 
information  acquisition  policies  for  the  second  type  of  problem. 

There  has  been  surprisingly  little  research  in  this  area.  A  pre¬ 
liminary  work  by  Marschak  and  Radner  [  11]  gives  a  valuable  formulation 
of  the  problem  but  does  not  establish  concrete  results  on  the  dyanmics 
of  information.  This  work  was  motivated  by  a  recent  dissertation  by 
Grum  [ 5  ]  who  studied  the  "perishing"  and  "replenishment"  of  information 
in  a  dynamic  environment.  His  analysis  is,  however,  limited  to  discrete- 
state  dynamic  systems  (Markov  chains) .  In  this  research  the  problem  is 
formulated  in  the  framework  of  continuous-state  systems  (following 
Marschak  and  Radner) ,  which  allows  a  more  systematic  study  of  the  problem 
and  facilitates  the  derivation  of  more  general  results. 

1.2  A  Contingent-Decision  Example 

This  example  illustrates  the  problems  of  information  outdating  and 
recovery  for  a  contingent  decision,  and  it  is  studied  in  detail  in  sub¬ 
sequent  chapters . 


-  k 


Suppose  that  we  are  expecting  a  bidding  occasion  but  are  uncertain 
of  its  time.  When  the  bidding  is  announced ,  we  will  have  to  make  our 
bid  within  a  short  time  and,  therefore,  will  not  be  able  to  obtain  new 
Information  on  the  uncertain  variables  at  the  bidding  time.  An  Important 
variable  for  the  bidding  is  our  cost  of  performing  the  contract  (p)  , 
and  clearly,  information  on  this  variable  would  help  us  make  a  better  bid. 
We  can  find  p  (at  some  cost)  at  any  time.  However,  since  this  variable 
changes  over  time,  knowledge  of  its  present  value  will  not  provide  us 
with  perfect  Information  about  its  value  at  a  future  time,  in  particular 
at  the  time  of  the  bidding.  We  expect  that  the  older  our  information  at 
the  bidding  time,  the  less  valuable  it  will  be  (information  perishing). 
Note  that  we  are  implicitly  assuming  a  relationship  among  the  values  of 
p  at  different  points  in  time,  and  that  we  have  some  knowledge  of  this 
relationship  from  past  experience.  If  this  was  not  the  case,  current 
knowledge  of  p  would  have  no  value  for  making  the  bid  in  the  future. 

Suppose,  for  example,  that  from  past  experience  we  know  that  P 
has  a  constant  mean  over  time,  and  that  its  variation  from  its  mean  (AP) 
changes  over  time  according  to  the  linear  Markovian  model 

Ap(t)  ■  X  •  Ap(t-l)  +  e(t)  (1) 

where  X  is  a  constant  and  e(t)  is  a  random  "noise"  term  with  zero 
mean.  This  knowledge  makes  it  possible  to  use  our  present  information 
about  p  when  making  our  bid  at  a  future  time. 

Suppose  that  we  have  obtained  information  about  the  cost  of  per¬ 
forming  the  contract  at  present,  (p(0))  .  As  time  passes,  this 


information  becomes  old  and  we  must  decide  when  to  update  it .  The 
optimum  updating  time  depends  on  how  quickly  the  existing  Information 
perishes,  on  the  cost  of  a  new  observation  of  P  ,  and  on  the  likelihood 
of  the  bidding  occurring  at  each  future  point  in  time.  The  perishing  of 
information  depends  not  only  on  the  dynamics  of  p(t)  ,  (Eq.  (1)),  but 
also  on  the  decision  for  which  the  information  will  be  used  (bidding 
payoff) ,  and  on  the  type  of  information  (perfect  or  imperfect) .  The 
optimum  information  recovery  time  may  also  depend  on  the  result  of  our 
previous  observation  of  p(t)  *  namely  p(0)  .  These  are  the  types  of 
problems  which  are  investigated  in  this  research.  The  analysis  is 
generalized  as  far  as  possible  in  order  to  facilitate  the  application  of 
the  results  to  various  specific  cases. 

1.3  Overview 

In  Chapter  2  the  dynamics  of  information  in  a  dynamic  environment 
are  investigated.  It  is  shown  how  the  dynamics  of  Information  are 
influenced  by  the  factors  which  influence  them,  namely  the  characteristics 
of  the  dynamic  environment,  the  decision  for  which  the  information  is  used, 
and  the  type  of  information  itself  (perfect  or  imperfect).  The  possible 
patterns  of  value  of  information  in  time  under  various  conditions  are 
investigated. 

In  Chapter  3  we  investigate  the  problem  of  information  recovery 
(updating)  in  the  face  of  a  dynamic  environment,  where  there  is  uncertainty 
in  the  time  of  our  decisions.  We  distinguish  between  a  priori  and  a  pos¬ 
teriori  policies  for  the  recovery  of  information.  An  a  priori  policy  is 
referred  to  as  a  policy  based  on  prior  knowledge  of  the  environment. 


In  an  a  posteriori  policy  both  the  prior  knowledge  and  the  result  of 
each  observation  of  the  environment  are  used  in  determining  the  optimal 
policy. 

In  Chapter  A  the  a  priori  information  recovery  policies  are  investi¬ 
gated  in  detail  and  the  conditions  for  optimum  information  recovery  are 
found.  It  is  assumed  in  this  chapter  that  the  decision  occurs  only  once. 
The  effect  of  risk  aversion  on  the  optimal  Information  recovery  policy 
is  also  studied. 

Chapter  5  investigate  the  a  posteriori  information  recovery 
policies  in  detail.  Necessary  conditions  for  optimum  recovery  policies 
are  found.  Conditions  are  also  found  under  which  the  a  priori  and  the 
a  posteriori  policies  coincide. 

Chapter  6  extends  the  results  of  Chapter  A  to  the  case  of  multiply 
occurring  decisions,  namely  when  the  decision  may  be  repeated  in  time. 

In  this  case  we  often  have  the  opportunity  to  obtain  free  information 
from  each  decision  and  use  this  information  for  future  decisions.  The 
optimality  conditions  for  recovery  of  Information  are  found  under  various 
types  of  information  learned  from  decisions. 

The  final  chapter  summarizes  the  results  of  the  study  and  suggests 
areas  for  future  research. 


CHAPTER  2 


INFORMATION  OUTDATING:  PERISHING  OR  ENHANCEMENT 

The  purpose  of  this  chapter  Is  to  Investigate  the  process  of  out- 
dating  of  information  in  a  dynamic,  probabilistic  environment.  We  will 
investigate  the  manner  in  which  the  outdatlng  of  information  depends  on 
the  dynamics  of  the  environment,  the  decision  for  which  the  Information 
is  used,  and  the  characteristics  of  the  information  Itself.  The  possible 
patterns  of  the  process  under  various  conditions  are  studied. 

2.1  Formulation  of  the  Problem 

The  outdatlng  of  information  refers  to  the  changes  over  time 
of  the  expected  value  of  a  piece  of  information  about  an  uncertain  vari¬ 
able  at  a  given  time.  Formally,  let  vj(£,d)  denote  the  payoff  of  a 
decision  D  ,  where  si  is  the  vector  of  uncertain  states  and  ji  is  the 
vector  of  decisions  to  be  set  by  the  decision  maker.  In  a  dynamic,  proba¬ 
bilistic  environment  £  would  be  changing  over  time  in  an  uncertain 
manner.  The  payoff  to  the  decision  maker* if  the  decision  is  made  at  time 
t  ,1s  v^(a(t),d)  .  Suppose  now  that  the  decision  maker  is  offered  the 
perfect  information  about  £(o)  .  The  value  of  this  Information  would 
depend,  in  general,  on  time  t  when  the  decision  maker  wants  to  (or  must) 
make  the  decision.  We  refer  to  this  information  as  unfresh,  delayed  or 
old  (as  opposed  to  fresh,  prompt,  or  new  Information).  We  often  expect 
that  the  value  of  this  information  will  be  diminishing  with  t  .  This  is 
called  information  perishing  [  5  ] .  The  process  of  information  outdatlng 
depends,  in  general,  on  the  dynamics  of  £(t)  ,  on  the  decision  for  which 
the  information  is  used  (payoff  function  v^),  and  on  the  characteristics 


of  the  information  Itself.  The  manner  in  which  each  of  these  factors 
influences  the  outdatlng  of  information  is  studied  in  this  chapter. 

In  a  more  general  case, the  payoff  function  itself  may  be 
changing  over  time.  However,  since  we  are  primarily  interested  in  the 
dynamics  of  information,  we  exclude  that  possibility  here. 

2.2  Notation 

The  following  notation  is  used  throughout  the  research: 


v^s.d)  : 
s  *  s(t) 


Payoff  function 

State  vector  (stochastic) 


£(t)  *  n(s(t))  :  Information  structure  (observation). 

n  may  be  a  noiseless  (many-to-one)  or 
noisy  (one-to-many)  maping.  denotes 
the  realization  of  n  over  £  . 

l(t)  :  Information  available  at  time  t  . 

y(t)  -  ^(t)  if  information  is  fresh. 
i(t)  -  _z(t-x)  if  information  is  T 
units  old  (delayed). 

2.3  Information  Outdatlng:  a  priori  vs.  a  posteriori 


It  is  very  helpful  to  distinguish,  at  the  outset,  between  a  priori 
and  a  posteriori  outdatlng  of  information.  Both  notions  are  used 
throughout  this  research.  A  priori  outdatlng  of  information  refers 
to  the  changes  in  the  a  priori  expected  value  of  information  about 
a(t-T)  for  making  the  decision  at  time  t  .  The  actual  realization  of 
s(t-x)  is  not  known,  when  the  expected  value  of  the  information  is 
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evaluated.  This  Is  the  case,  for  Instance,  when  Information  about  the 
state  Is  available  with  a  delay  t  .  We  may  define: 

V  /_  v(t)  ■  expected  value  of  Information  structure 
TUt-x) 

n(g(t-x))  for  making  the  decision  at 

time  t  •  (2.1) 

We  may  expect  vn^t_Tj(t)  to  be  decreasing  as  T  Increases,  (see  Fig.  2.1). 


Figure  2.1  A  priori  outdating  of  Information. 


A  posteriori  outdating  of  information  refers  to  the  case, 
where  the  actual  realization  of  the  observation  Is  known  and  we  are  con¬ 
cerned  with  the  usefulness  of  this  Information  (data),  as  time  passes. 

Let  us  define: 

v'  /+  \(t)  ■  expected  gain  at  time  t  of  using  an 

already  known  Information,  namely  z(tQ)  .  (2.2) 


Note  that  V'.  .  (t)  depends,  in  general,  on  z(t  )  ,  (see  Fig.  2.2). 

zvt  i  o 

■  o 


o 


Figure  2.2  A  posteriori  outdating  of  information. 


It  is  clear  that  V^t  Tj(t)  is  the  a  priori  expected  value  of 

V  ^\(t)  over  all  values  of  z(t-x) 
z^t-r;  — 


2.4  Information  Perishing  Rate 


Suppose  that  V^^t_Tj(t)  is  a  decreasing  function  of  T  .  This 
is  called  "Information  Perishing"  [  5 ] ,  and  the  "Rate  of  Information 
Perishing"  is  defined  as: 


expected  value  of  information  with  delay  T 
expected  value  of  information  with  delay  T+l 

(2.3) 

This  definition  is  useful  if  p  is  a  function  of  x  but  not  t  .  If 
s(t)  is  stationary  and  the  payoff  function  v^  does  not  change  in 
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time,  then  p  Is  Independent  of  t  .  Note  that  using  the  a  posteriori 

value  of  information  (V1 )  Instead  of  the  a  priori  value  of  information  (V) 

in  (2.3)  will  not  give  us  a  good  measure  of  information  perishing 

because  a  rate  so  defined  would  depend  not  only  on  t  -  t  ,  but 

also  on  *(t  )  ,  namely  the  realization  of  the  observation  at  time  t 
°  o 

2.5  Limitations  and  Assumptions 

One  important  limitation  of  dealing  with  information  is  that  it 
cannot  be  measured  like  other  commodities.  One  cannot  assign  a  meaningful 
quantity  to  a  piece  of  Information  (as  a  measure  of  its  information  con¬ 
tent)  in  such  a  manner  that  the  Information  with  the  larger  quantity  is  always 
more  valuable  than  the  one  with  the  smaller  quantity.  This  is  because  the 
value  of  a  piece  of  information  is  not  independent  of  the  decision  for 
which  the  Information  is  used.  In  particular.  Information  structure 

may  be  more  valuable  than  n2  for  making  a  decision  ,  while  n2  is 

* 

more  valuable  than  for  decision  •  Therefore  we  have  to  specify 

the  decision,  or  equivalently,  the  payoff  function  v^s.d)  (and  also 
the  utility  function  when  risk  aversion  is  significant).  The  results 
would,  of  course,  be  limited  to  the  particular  payoff  function  which  is 
used.  The  more  general  the  payoff  function,  the  more  general  the  results 
will  be.  The  mathematical  complexities,  however,  often  limit  our  choices 
of  the  payoff  function.  In  almost  all  of  this  study  we  assume  a  quadratic 
payoff  function,  which  is  a  good  approximation  in  most  practical  situations: 


Although  one  can  find  information  structures  tu  and  ru  ,  such  that  n, 
is  always  (for  every  potential  user)  more  valuable  than  n,  (n,  "more  1 
informative"  than  n2).  this  relation  cannot  order  all  information 
structures  and  n2  . 
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v^s.d)  -  s'G  d  +  |  d’Hd  (2.4) 

where  £  and  d^  are  vectors  and  G  and  H  are  matrices  of  appropriate 
dimensions.  H  is  assumed  to  be  negative  definite  to  guarantee  the 
existence  of  a  maximum  point.  Adding  any  function  of  alone  to 
v^(s,d)  in  (2.4)  will  change  none  of  our  results  because  it  will  not 
change  the  optimal  decision,  although  the  payoff  will  be  different. 

The  dynamic  systems  which  we  study  here  are  mostly  linear  Markovian. 
Some  of  the  results,  however,  are  true  for  any  system.  The  autoregressive 
systems  have  also  been  studied.  The  systems  are  assumed  to  be  stationary. 
We  can  show,  however,  that  there  is  no  fundamental  difficulty  in  extend¬ 
ing  the  results  to  nonstationary  systems.  Another  assumption  concerns 
the  distribution  of  _s(t)  .  It  is  often  assumed  that  s^(t)  has  normal 

distribution.  This  assumption  simplifies  the  results  and  is  not  an 
unreasonable  assumption  because  of  the  additive  structue  of  the  dynamic 
systems  studied. 

2.6  A  Priori  Outdating  of  Information 

Throughout  the  rest  of  this  chapter  we  will  study  the  a  priori  out- 
dating  of  information.  In  doing  so,  however,  we  have  to  find  the  a  pos¬ 
teriori  outdating  of  information  as  well.  Nevertheless,  our  main  emphasis 
is  on  the  a  priori  case.  In  the  following  chapters,  where  we  study  the 
recovery  of  information,  the  two  cases  are  studied  separately. 

Assuming  a  quadratic  payoff  function 

v^s.d)  •  s'Gd+yd'Hd  (2.5) 
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the  expected  payoff  of  a  decision  d  ,  given  information  £  is 


^v^(js,d)  li,<^  -  (a'  lz,<^  G  1  ^  d '  H  d 


(2.6) 


The  optimal  decision  d  (given  information  £  )  is  obtained  by  setting 
the  derivative  of  (2.6),  with  respect  to  d  ,  to  zero: 


9 

3d 


^(s.d)  “  G^s|z.  ^+Hd  -  0 


or 


d 


-  -H-1G’*5(£) 


(2.7) 


jj(y)  *(  sj^,  )  *s  c^e  posterior  mean  of  £  given  information  y  . 

A 

Substituting  cl  Into  (2.6)  the  maximum  expected  payoff,  given  y  ,  is: 


^vj_(®»d(y)) ly»<^  “  •  I i'  (l)'GH  1  g ' •  S (£) 


(2.8) 


We  assume  that  £(t)  is  stationary  and  ( £(t)  |  & )  m  0  .  The  assumption 

that  (8(t)|<?)«  0  is  not  restrictive  because  if  (sU)|<?)  ■  a  i  0  , 

we  can  always  make  a  change  of  variable  s^(t)  *  s^(t)  -  a  ,  where  s^(t) 
has  zero  mean, and  the  problem  will  remain  basically  the  same.  By  this 
assumption  the  maximum  expected  payoff  without  information,  from  (2.8),  is 


We  use  inferential  notation  which  explicitly  shows  the  state  of  informa¬ 
tion.  {x|y,<?)  denotes  the  probability  distribution  of  x  given 
observation  y  and  our  prior  knowledge  <? .  (x|y,df)  and  v(x|y,£) 
denote  the  expected  value  and  the  variance  of  x  given  (y,<?  )  ,  respec¬ 
tively. 


14 


zero  because  i  ' 


*  (s(t)|<?)  -  0  .  Now  suppose  that  our  information 

X(t)  is  the  result  of  observation  n  with  delay  x  , 

i<t)  *  z(t-x)  *  n(s(t-T))  (2.9) 

From  (2.8),  the  maximum  expected  payoff  with  this  information  is 

^1(l(t),d(^(t))|^(t)  -  n(s(t-T)),<?^  '  |'(l)M  i(i)  (2.10) 

where 


i(x)  =  ^£(t) |i(t)  -  n(s(t-x)),<?^ 


(2.11) 


and 


M 


-  |  G  H_1  G ' 


(2.12) 


Since  the  expected  payoff  with  no  information  is  zero,  (2.10)  gives  us 
the  value  of  information  £  *  or  according  to  our  definition  (2.2),  the 
a  posteriori  value  <jf  information  V’^  ^(t)  *  Since  s(c)  is  assumed 
to  be  stationary,  ^z(t_T)(t)  does  not  depend  on  t  ,  and  we  show  it  by 


vz’(x)  "  !!(z(T)).M-i(z(T))  .  (2.13) 

Note  that  Vz*(x)  depends  on  h  *  namely  the  realization  of  the  observa¬ 

tion  0  at  t-T  .  A  priori  value  of  information  structure  n  with 


delay  t  T^(t))  can  be  obtained  by  taking  the  expected  value  of 

VZ'(T)  over  values  of  z^(t)  .  Again  by  statlonarlty  of  s(t)  , 


this  value  does  not  depend  on  t  and  we  will  show  It  by  V 


t)(t)  ' 


Vn(T)  =  ^Vz(x)l<?)  *  <I’U(T)).M.s(z(T))k>  .2.14) 


For  a  random  vector  x  and  a  matrix  A  we  have 


(x'Ax  \<$)  -  trace  |A*S  j  + 


(2.15) 


where  Z  Is  the  covariance  matrix  of  x  .  Therefore  (2.14)  can  be 
x  — 

written  as: 


vn(D  "  tr 


(me  5 ) +  >  >  i  m  ( z  (t ) )  i 


^S(£(T))|<f^  *  ^(s(t)  fz(t-X)  ,<g>  )|«^  -  ^S(t)|<^“  0 


Therefore, 


vn(t)  '  tr 


K) 


(2.16) 


where 


’V<|U(T  ))!<?> 


(2.17) 


Note  that  Z~  depends  only  on  x  (and  not  on  z)  .  Equation  (2.16) 
s 

gives  us  the  expected  value  of  Information  structure  f)  with  delay  x 


The  effect  of  payoff  function  on  the  value  of  information  is  reflected 
in  matrix  M  •  The  dynamics  of  the  system,  as  well  as  the  characteris¬ 
tics  of  the  information  structure  and  the  delay  x  ,  are  represented  by 
matrix  of  the  covariances  of  the  posterior  means.  To  see  the  effect 

of  each  factor  separately,  let  us  first  calculate  the  posterior  mean  , 
S(z(x))  .  Expanding  over  s(t-x)  we  have 


i(z(x))  =  <s(t)  |^(t-x),<£^  =  J  ^(t)  |s^(t-T)  ,£(t-x) 


s(t-x) 


{s(t-T)|z(t-x),<S’}  =  J  (s(t)  |s(t-X)  ,<^{s/t-X)  |  £(t-X)  ,<?  } 


S(t-T) 


(2.18) 

(_s(t)  |  s/t-x)  ,<5T )  is  a  function  of  s/t-x)  and  x  .  Using  a  linear 
approximation  for  this  function,  and  in  view  of  stationarity  of  s(t)  , 
we  can  write: 


^s(t)  I s(t-x) =  R(x)*s(t-x)  +  a.(x) 

where  R(x)  is  a  square  matrix  and  q(x)  is  a  vector.  Since 
(s(t)  \$ )  *  0  ,  it  follows  that  cj_(x)  =■  0  *  and  we  have 


^s(t)  |  s(t-x)  *  R(T)*s(t-x) 

We  will  show  later  that  (2.19)  is  exact  if  s_(t) 
Substituting  from  (2.19)  into  (2.18)  we  have 


(2.19) 

has  normal  distribution. 
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^S(t)|z(t-T),^  -  R(T)*^8(t-T)|z(t-T),<f^ 


(2.20) 


Therefore, 


E~  .  <^(s(t)  |z(t-T),«f  >  \#y  rn 

-  V-(r(t)*{  8(t-T)|z(t-T),<f>|<^ 

y 

-  R(T)*^(s(t-T)|z(t-T),<f>|^  *R’(T) 

-  R(T)*£-  *  *'  (T)  (2.21) 

2o 


where  E~  Is  the  matrix  of  covariances  of  posterior  means  with  fresh 
— o 

Information.  Note  that  by  stationary  assumption,  E~  does  not 

-o 

depend  on  the  time  of  observation  t-T  .  Equation  (2.21)  separates 
the  effects  of  the  Information  structure  Cl)  and  the  dynamics  of  the 
system.  Substituting  from  (2.21)  Into  (2.16)  we  have 


\<T> 


tr 


M’R(T)*£g  •  R’(T) 


-o 


(2.22) 
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Equation  (2.22)  shows  how  the  value  of  information  is  related  to  the 

decision  (payoff  function),  the  dynamics  of  j3(t)  ,  the  information  structure 

H  ,  and  the  delay  T  .  The  payoff  function  influences  the  value  of 

information  through  matrix  M  *  -  1/2  GH  ^G'  .  Information  structure 

H  is  represented  by  I~  ,  namely  the  matrix  of  covariance  of  posterior 

s 

-o 

means  with  fresh  information.  The  delay  T  and  the  dynamics  of  the 

system  are  reflected  in  the  matrix  R(t)  .  R(t)  is  the  coefficient 

of  the  linear  approximation  of  the  mean  of  £(t)  in  terms  of  s(t-T)  . 

We  now  show  that  if  £j(t)  has  normal  distribution,  (2.22)  is  exact  and 

R(t)  is  the  "multiple  correlation"  matrix  between  £(x)  and  s(t-x) 

* 

For  normal  £(t)  ,  we  have: 


< 


£(t)  |s(t-T),^  -  ^£(t)|<^  +  J3 

s(t) ,£(t-T) 


s(t-X),s(t-X) 


•  U(t-T)  -  (  jJ  (t— T)  |<S)] 
Since  (s(t)|<f)  -  (  s(t-x)|<S>)  -  0  , 


(s(t)|s(t-T),<g^ 


£(t),s(t-X) 


-1 

•E 

s(t-x) ,s (t-x) 


s(t-x)  . 


Therefore  (2.19)  is  exact  for  this  case  with  R(x)  being 


R(T)  «E  •£ 

£(t),£(t-x)  £(t-x)  ,s(t-T) 


(2.23) 


See  updating  relations  for  multivariate  normal  distributions  in  [ 2  ]  for 
example. 
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This  matrix  is  called  the  multiple  correlation  matrix  between  £(t) 

and  s(t-T)  .  If  s(t)  Is  single,  R(r)  reduces  to  the  correlation 

coefficient  between  s(t)  and  s(t-x) 

For  normal  s(t)  we  can  find  E~  In  terms  of  the  covariance 
—  —  o 

matrices  I  and  I  .  The  prior  variance  E  (“E  )  can  be 

8,_Z  Z.Z  8  \  8,8/ 

decomposed  to  the  expected  value  of  posterior  variance  Eg  ,  and  the 
variance  of  the  posterior  mean  (i)  [ 16]  : 

or 

E  -E  -^E  (2.24) 

s  s  \  £  f 

(2.24)  is  true  In  general.  For  £  normal  we  have  [ 2  ] 


-E  -E  E_1 

8  8,2  2 


E 

2,8 


Since  Eg  does  not  depend  on  z_  ,  from  (2.24)  we  have 


Z  -Z-Z  -Z 

a  S  8  S , 2  £ 


(2.25) 


Note  that  for  this  case,  Is  the  difference  between  the  prior  and 

the  posterior  varlsnce  of  s  and  can  be  written  In  terms  of  E  and 

■  »* 

E  .  Therefore,  for  a  normal  state  we  have: 
z 
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R'(t) 


(exact) 


E 


s(t)  ,£(t-T) 


E 


-i 

s(t-r) 


multiple  correlation 
between  _s(t)  and  s^(t-r) 

reduction  in  variance  as  a 
result  of  the  observation 


2.7  Information  Outdating  in  Linear  Markovian  Systems 

The  results  obtained  in  Section  2.6  are  true  for  dynamic  systems 
in  general.  A  special  case  of  interest  is  the  linear  Markovian  system 


characterized  by  equation 


s(t)  -  A  s/t-1)  +  £(t)  (2.26) 

where  £(t)  is  the  vector  of  states  at  time  t  ,  A  is  a  square  matrix 
and  £  is  a  vector  of  random  "noises."  We  assume  that  £(t)  ,  £(t-l),  ... 
ere  independent  and  identically  distributed  and  have  zero  means  (for  all 
t)  .  We  also  assume  that  j*(t)  is  stationary.  This  assumption  requires 
that  all  the  eigenvalues  of  A  lie  Inside  the  unit  circle.  Substituting 
for  s(t-l)  in  (2.26)  from  s(t-l)  •  A  s(t-2)  +  e(t-l)  ,  we  have 

s(t)  "  A^s(t-l)  +  A  £(t-l)  +  £(t) 

By  similar  substitutions  for  s(t-2) , . . .  ^(t-T+1)  we  get 

s(t)  -  AT«(t-T)  +  AT_1£(t- T +1)  +  AT-2£(t- T +2)  +-.+ 

+  A  £(t-l)  +  £ (t)  (2.27) 
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From  the  independence  of  the  £(t)'s  and  the  statlonarlty  assumption 


it  follows  that  £(t)  is  Independent  of  £(t-l)  ,  s(t-2) ,  . . . 

Therefore  from  (2.27)  we  have 

^£(t)  |£(t-T) ,  -  AT*s(t-T)  .  (2.28) 

Comparing  (2.28)  with  (2.19),  R(f)  for  the  linear  Markovian  system  of 
(2.26)  is 

R(t)  -  AT  (exact) 


Substituting  into  (2.22)  we  find  the  (exact)  value  of  Information  structure 
n  with  delay  t  for  our  Markovian  system: 


vn(t) 


tr 


M*A 


(2.29) 


In  the  following  we  will  investigate  the  patterns  of  Tn(r)  <aaa 
function  of  t)  under  various  circumstances.  All  the  results  concern 
the  Markovian  system. 

Theorem  2.1.  If  Information  structure  h  is  perfect,  (n(£(t))  • 
s(t))  ,  then  Vn(T)  (for  Markovian  system)  always  decreases  as  T 
Increases  (Information  Perishing) . 

Proof :  From  (2.27)  and  by  the  statlonarlty  assumption,  j»(t)  can 
be  written  as 
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s(t)  -  e(t)  +  A  e(t-l)  +  A2e(t-2)  +  . . . 


(2.30) 


From  this  equation,  and  the  independence  of  e(t)'s  we  have 


£s(t)  -  V  +  AVA'  +A2VA'2+  .... 


(2.31) 


where  V  is  the  covariance  matrix  of  e(t)  .  Since  information  is 


perfect. 


•  (s(t)  |  n(s(t)),<?^  -  ^s(t)  |  s(t),<?^  -  s(t) 


Therefore  , 


e  -  r  -  e  aw1 


s  s 

-o  — 


(2.32) 


Substituting  into  (2.29)  we  have: 


vn(r)  '  tr 


-  tr 


:[k  •*t-(E)*1V*4*,T] 

■  [m  •(!  a1v*,1)1 
|Ema1va’ 


-  E  tr  (MA1  V  A*  A) 
i«T 
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Since  M  is  positive  definite  (because  H  Is  negative  definite) 
and  A^VA' 1  Is  positive  definite,  tr(M*AiVA,i)  Is  positive  [3] 
It  follows,  therefore,  that  V_ .  .  decreases  as  r  Increases  for  al 


values  of  t 


Figure  2.3  Value  of  perfect  Information 


Theorem  2.2.  If  all  the  matrices  A  ,  M  ,  and  E 


then  Vn(x)  Is  always  decreasing  with  t  and  the  rate  of  information 

2 

perishing  (P(t))  Is  always  greater  than  or  equal  to  1/X  *  where  X_ 


Is  the  largest  eigenvalue  of  A 


n(x+D 


T  X 

Let  A  Eg  A'  ■  B  ,  B  is  nonnegative.  Since  A  is  nonnegative 
-o 

if  X^  is  the  largest  eigenvalue  of  A  for  any  nonnegative  vector 
b  we  have  [ 3  ] : 

A*b  <  Xn*b  (2.33) 

Since  all  the  columns  of  B  are  nonnegative  it  follows  that 

A*B  <  X  *B  (2.34) 

—  n 

Also  from  (2.33)  we  have 

b'  A'  <  X  b '  V  b '  >  0 
—  n  — 

Considering  b  '  as  a  row  of  B  ,  it  follows  that 

BA'  <  X  B  (2.35) 

—  n 

From  (2.34)  and  (2.35)  we  have 

ABA'  <  Xn2  • B  (2.36) 

From  (2.36),  and  in  view  of  the  nonnegativity  of  M  ,  we  have 


Vt+1) 


tr(MA  B  A')  <  tr(M  •  X  2*  B) 

—  n 


Therefore 


>4-  *T>0 

Vn(T+l)  V 


Since  An  <  1  (by  statlonarity  assumption),  information  is  always 

perishing  with  a  rate  greater  than  or  equal  to  1/X  2  . 

n 

Theorem  2.3.  If  are  distinct  eigenvalues  of 

matrix  A  ,  V  *  ^  can  be  written  as 


V)  -S  W/ 

i,j-l 


where  c^  is  constant. 


Proof :  A  can  be  written  as: 


where 


A  ■  p  A  p 


(2.37) 


From  (2.29)  we  have: 

vn(x) 
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Since  tr(X» 
write 


Letting  T  * 


Writing  T 


and 


which  is  the 

or 


Y)  ■  tr(Y»X)  for  arbitrary  matrices  X  and  Y  ,  we  can 


n(r) 


tr 


(P'MP  •  aT  •  p,_1- 


•  -1,-1 

PMP  and  Q-PTP  ,  we  have 

s 

— o 


Vn(t)  ■  «  tt  AT  Q  A  T> 


ft^]  and  Q  *  [q^]  we  have 


T  T 

A  Q  A 


,  T  ,  T 

qij  ^i  Xj 


;n(T)  -  tr(T*  AXQ  at)  -  lit  •  q^x'x 

i-1 j-1 


n  n 


T  ^  t 
i 


‘iS-i  v  v  (wT 


desired  result.  Note  that 


CU  "  *ij  *  qJi 


C  -  [ClJ]  -  T  □  Q'-  T  O  Q 


2T 


Therefore  the  matrix  C  of  coefficients  is  the  congruent  product  of 
matrices  T  and  Q  . 


From  Eq.  (2.37)  we  see  that  vn(T)  is  the  weighted  sum  of  the 

geometric  terms  (in  T )  with  the  base  of  each  term  being  the  product 

of  two  eigenvalues  of  matrix  A  .  This  Indicates  that  the  eigenvalues 

of  A  play  an  Important  role  in  determining  the  behavior  of  V  ,  . 

n(x) 

(as  a  function  of  T )  .  The  effects  of  the  payoff  function  and  the 
information  structure  are  reflected  in  the  constant  coefficients,  c 
In  the  following  we  will  explore  the  Important  modes  of  behavior  of 

vn(T)  ‘ 


ij 


Remark  2.1.  If  A  is  diagonal,  and  either  M  or  is  also 

'  s 

-o 


diagonal,  then 


n(x) 


n 

-  E 

i-1 


2t 


(2.38) 


This  corresponds  to  the  case,  where  either  no  interaction  among  informa¬ 
tion  components  exists  (A  and  Z~  diagonal)  ,  or  if  such  an  inter- 

-o 

action  exists  it  is  eliminated  by  the  special  form  of  the  payoff  function 
(M  diagonal).  For  this  case  it  is  easy  to  show  that 


— V  <  P(T> 

X  z 
n 


Vn(T)  <  JL 
V  -  X2 

Vt+d  ai 


V  n,  T  (2.39) 


and 


P(t)  >  p(T+l) 


(2.40) 


where  X^  and  X^  are  the  smallest  and  the  largest  eigenvalues  of  A, 
respectively.  The  Intuitive  interpretation  of  (2.39)  and  (2.40)  will 
be  clear  if  we  note  that  the  rate  of  information  perishing  for  the  case 
of  a  single  variable  system. 


s(t) 


X  s(t-l)  +  e(t) 


is  1/X  .  Since,  in  this  case,  no  interaction  among  the  variables  (or 

information  components)  exists,  the  rate  of  information  perishing  is  be¬ 
tween  the  rate  of  information  perishing  for  the  fastest  perishing  component, 

2  2 
(1/X^)  and  the  rate  for  the  slowest  perishing  component,  Cl/X^  )  .  As 

time  passes,  the  components  corresponding  to  larger  perishing  rates 

perish  more  and  the  share  of  slower  perishing  components  in  p(t) 

becomes  larger.  This  causes  p(x)  to  decrease  as  t  increases.  For 

very  large  T  ,  the  component  corresponding  to  the  slowest  perishing 
*2 

rate  (1/X  )  becomes  dominant  and  p(t)  approaches  1/X  2  (See  Fig.  2.4). 

”  n 


^  T 

Figure  2.4.  p(t)  for  diagonal  A  and  M  (or  A  and  ). 

!o 
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Remark  2.2.  If  n  is  not  perfect,  ^  may  be  increasing  with 
T  in  some  Intervals.  Two  examples  of  such  cases  are  given  below. 


Example  2.1.  Let 


A  * 


,2  0 

0  .9 


and  M 


1/2  -1 
-1  5/2 


M  is  positive  definite  and  eigenvalues  of  A  are  less  than  1 


The 


choice  of  E~  may  not  be  totally  arbitrary  since  E~  may  depend  on 
s  s 

— o  -o 

A  (because  E  depends  on  A)  .  However,  by  appropriate  choice  of  E 

S  b 


we  have  a  large  range  from  which  to  choose  Eg 


From  (2.31) , 


Letting 


we  find 


=  £+a£  A’  +  A2  2  a,2+--- 

e  e  e 


where  a^  ■  .2  and  a22  -  .9  are  elements  of  matrix  A  .  Since 

the  choices  of  v^  and  v2  are  arbitrary  (subject  to  being  positive)  , 

0  and  0  can, in  fact, be  chosen  arbitrarily.  Let  E  be 
S1  s2  £ 
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To  find  E~  we  need  to  have  the  covariance  matrix  of  (s,z)  .  Let 
s  -  - 

this  matrix  be: 


The  only  condition  on  this  matrix  is  that  it  must  be  positive  semidefinite. 
This  condition  is  satisfied.  Assuming  that  s  and  z  have  normal  dis¬ 
tribution,  E~  can  be  calculated  from  (2.25): 
s 


Using  this  matrix  we  can  now  calculate  the  value  of  information.  From 
theorem  (2.3)  we  have 
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Notice  that  the  value  of  information  with  one  unit  delay  (and  even  two 
units  delay)  is  greater  than  the  value  of  fresh  information.  A  more 
interesting  case  is  given  in  Example  2.2  below. 

F.y ample  2.2  Let  M  and  be  the  same  as  in  Example  1,  but 


.9  0 


0  -.9 


Notice  that  the  same  E~  as  in  Example  1  can 

s 

-o 


be  chosen  with  the  new  matrix  A  ,  because  Eg  could  be  chosen  indepen 
dently  of  a^  and  a22  .  For  this  case  we  have: 


T □ Q - MCl£ 


1  -1 

-  f. 


V  (T)  =  3 . 5 C - 81)T  -  2(-.8l)T 


V  ,  s  is  shown  in  Fig.  2.6. 


Figure  2.6  V 


in  Example  2.2. 
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Remark  2.3.  It  is  easy  to  show  that  if  some  of  the  eigenvalues 

of  A  are  complex,  will  Include,  in  general,  terms  of  the  form 

T  2 

8  sinOr+ct)  ,  that  is,  damped  oscillatory  terms. 

We  have  seen  in  the  above  examples  that  the  value  of  Information  does 
not  always  perish, as  it  becomes  outdated.  This  result  is  rather  counter¬ 
intuitive  considering  that  our  systems  have  been  Markovian.  We  will  give 
an  intuitive  interpretation  for  this  mode  of  behavior  of  value  of  infor¬ 
mation  in  Section  2.9,  where  we  study  the  autoregressive  systems.  The 
examples  also  show  that  the  value  of  information  may  have  various  patterns 
which  depend,  to  a  large  extent,  on  the  structure  of  the  dynamic  system 
(eigenvalues  of  matrix  A)  .  It  must  be  noted,  however,  that  other 
factors,  namely  the  payoff  function  and  the  information  structure 
itself,  can  also  have  important  influences  on  the  behavior  of  the  value  of 
information.  We  have  shown,  for  example,  that  if  the  information  is  perfect , 
then  it  will  always  be  perishing,  regardless  of  matrices  A  or  M  . 

The  enhancement  of  Information  with  delay  has  an  interesting  impli¬ 
cation  for  timing  of  the  information  acquisition.  Suppose  that  the  time 
of  the  decision  is  fixed.  We  can  obtain  information  about  the  state  of 
the  system  at  a  point  in  the  past  such  that  the  expected  value  of  this 
Information  is  maximum  at  the  time  of  the  decision.  If  the  timing  of 
the  decision  is  flexible,  however,  we  can  buy  information  and  choose  the 
time  of  the  decision  according  to  the  information,  such  that  the  payoff 
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is  maximized. 


2.8  Information  Out da ting  in  the  Bidding  Example 

In  this  section  we  will  study  the  outdatlng  of  information  in  the 
bidding  example  of  Chapter  1.  For  the  bidding  model  Itself  we  will  use 
the  model  introduced  by  Howard  [6],  According  to  this  model,  the 
lowest  competitive  bid  (£)  and  the  cost  (to  us)  of  performing  the  con¬ 
tract  (p)  are  uncertain.  The  optimum  bid  must  be  decided  based  on  infor¬ 
mation  about  the  uncertain  variables.  There  is  no  uncertainty  in  the 
time  of  the  bidding  In  Howard's  model.  We  assume,  however,  that  the 
time  of  the  bidding  is  uncertain  and  that  our  uncertain  variables  change 
over  time.  As  a  result,  our  information  about  these  variables  becomes 
outdated  by  passage  of  time. 

Let  b  denote  our  bid,  the  profit  v  is 


v(b, £,p) 


b  <  Z 

b  >  Z 


It  is  easy  to  show  that  the  expected  profit  of  bidding  b  is 


(v(b,£,p)|b,y,«f)  -  U  >  b|y,<? }  •  fb  -  (p|y,<?)  ]  (2.41) 

where  y  denotes  our  information  about  the  uncertain  variables  at  the 
bidding  time.  We  assume,  for  simplicity,  that  our  knowledge  about  the 
lowest  bid  (£)  does  not  change  over  time  and  we  assign  a  uniform  distri¬ 
bution  to  this  variable  (see  Fig.  2.7). 
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U|<?} 

1 
2 

2  i 

Figure  2.7.  Probability  distribution  of  the  lowest  competitive  bid. 

The  expected  profit  from  (2.41)  reduces  to 

(v(b,Jl,p)|bfy,<?>  -  y(2-b)  *  [b-<p|y,«?>]  (2.42) 

The  bid  which  maximizes  the  expected  profit  can  be  obtained  by  setting 
the  derivative  of  (2.42)  with  respect  to  b  to  zero.  We  find 

b  -  l+y(p|y,<?>  (2.43) 

and  the  maximum  expected  profit  is 

Vx(y)  -  (  v(b,£,p)  |y,<? ) 

-  y  (1  -  Y<p|y,*»2  (2.44) 

We  assume ,  as  was  suggested  in  Chapter  1,  that  from  our  past  experience 
we  know  that  the  cost  of  performing  the  contract  (p)  has  a  constant  mean 
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over  time  (m) ,  and  that  its  variation  from  its  mean  (s  ■  Ap)  changes 
over  time,  according  to  the  linear  Markovian  model  of 

s(t)  ■  X  »s(t-l)  +  e(t)  (2.45) 

where  X  is  a  constant  less  than  one,  and  e(t)  is  a  random  noise  with 
zero  mean.  Ve  assume  that  e(t)'s  are  independent  and  identically  dis¬ 
tributed.  Now  suppose  that  our  information  about  the  cost  p  (or 
equivalently  s  )  at  time  t  is  the  cost  at  time  zero 

y(t)  -  s (o)  -  s 

o 

From  (2.44)  the  expected  profit  of  bidding  at  time  t  with  this  informa¬ 
tion  is 

vl(t,so)  ■  (v(b(t),  f,  ,  p(t)  |so,<f  > 

-  \  [1  -  \  <p(t)|so,dD]2 

■  y  [1  -  y  (m  +  (s(t) |sQ,<?)]2  (2.46) 

but  from  (2.45)  it  is  easy  to  show  that 

<s(t)|s  ,<?  >  -  XC  s  (2.47) 

o  o 

Therefore,  we  have 
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vl<t,so> 


(2.A8) 


.  f  (!  -  f  -  I  Xt.o>2 


V^(t,sQ)  ,  namely  the  maximum  expected  profit  of  the  bidding  at  time  t 
with  Information  sq  ,  Is  shown  In  Fig.  2.8.  For  bq  <  0  ,  the  expected 


Figure  2.8.  Expected  profit  of  the  bidding  time  at  time  t  , 
given  sq  . 

profit  is  decreasing  with  t  .  It  is  increasing  for  sq  >  0 .  Intuitively,  when 
bq  <  0  ,  the  cost  of  performing  the  contract  at  time  zero  is  lower  than 

the  average  cost  m  ,  but  from  Eq.  (2. AS)  this  advantage  tends  to  fade 
out  in  time.  Therefore,  the  expected  profit  is  decreasing  In  time.  The 
converse  is  true,  when  *0  >  0  .  The  a  priori  expected  profit  at  time 

t  Is 
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i 


1 

2 


(2.49) 


Vi(t)  =  (V^t.s^j^) 


X 

s 


where  cr  Is  the  variance  of  s  •  V^(t)  is  shown  in  Fig.  2.9.  It  is 
easy  to  show  that  the  expected  profit  of  the  bidding  with  no  information 


Figure  2.9.  The  a  priori  expected  profit  of  the  bidding  at  time  t 
with  perfect  information  about  s(0) 


1  in  " 

is  —  (1  -  j)  .  Therefore,  the  a  priori  value  of  perfect  infor- 

1  2t 

mat  ion  about  s(0)  at  time  t  is  ■=■  0  X  .  Consequently  the  informa- 

o  S 

2 

tlon  is  always  perishing  and  the  rate  of  information  perishing  is  1/X 
If  our  observation  of  s(0)  is  not  perfect,  then  Og  in  (2.49)  will  be 
replaced  by  <7g  ,  namely  the  variance  of  the  posteriori  mean  of  s(0)  , 

given  the  observation.  We  notice  that  the  rate  of  information  perishing 
remains  the  same  as  the  case  of  the  perfect  information. 
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2.9  Information  Outdating  in  Autoregressive  Systems 

A  process  which  is  used  frequently  for  modeling  the  real-world 
processes  is  the  autoregressive  process  [  4 ] .  The  autoregressive 
process  of  order  p  is  defined  by  the  equation 


s(t)  -  ox  s(t-l)  +  a2  s(t-2)  +  •••  +  a  s(t-p)  +•  e(t)  (2.50) 


The  state  of  the  system  at  each  time  depends  on  its  states  at  the  last 
p  points  of  time,  and  a  random  noise  c(t)  .  We  assume  that  e(t)'s 
are  Independent  and  Identically  distributed  and  s(t)  is  stationary.* 
For  simplicity,  we  assume  that  s(t)  is  a  single  variable.  Equation 
(2.50)  can  also  be  written  as  a  linear  Markovian  system  in  matrix  form: 


s(t-l) 


8 (t— p+1) 


r°i  “2- 

1  1  0  ... 


1  0 


0  0  ..1  0 


8  ( t— 1) 
s(t-2) 


s(t-p) 
*  0 

S(t-l) 


(2.51) 


The  first  row  In  (2.51)  is  the  same  as  equation  (2.50)  and  the  rest  of 
the  rows  are  Identities  s(t-l)  ■  s(t-l)  for  1  -  1,2,  ...  p-1  . 


Condition  for  the  statlonarlty  of  s(t)  Is  that  all  the  roots  of  the 
characteristic  polynomial 

xp  -  a. a?"1  -a,xp“2  -  •••  -  a  -  0 
xi.  p 

lie  Inside  the  unit  circle. 


to 


Since  this  system  is  Markovian,  according  to  Theorem  2.1  the  value  of 
perfect  information  about  s(t-T)  (that  is  perfect  Information  about 


p  consecutive  states  of  the  original  system:  s(t-T)  ,  s(t-T-l)  , 

...  ,  s(t-T-p+l))  always  decreases* as  delay  T  increases.  We  will 

see,  however,  that  the  value  of  perfect  information  may  be  increasing  with 
delay, if  the  information  does  not  include  all  the  p  consecutive  states  in 
the  past.  We  are  Interested,  in  particular,  in  the  value  of  Information 
about  the  state  at  only  one  point  of  time  in  the  past.  Assuming  a  qua¬ 
dratic  payoff  function  in  a  single  variable,  s(t)  , 


v(s,d)  *  g  •  s (t)  •  d+|h  •  d2  (2.52) 

the  value  of  information  structure  n  with  delay  T  from  Eq.  (2.22) 
reduces  to: 


Vn(r) 


m 


(2.53) 


where  m  -  1/2  •  g2/h  is  a  constant,  a~  »  V((s(t)  |n(s(t))  ,<?  )|<?)  is 

3 

O 

the  variance  of  the  posterior  mean  with  fresh  information  and  is  con¬ 
stant  by  the  stationarity  assumption,  and  r^  is  the  coefficient  of 
the  linear  approximation  of  the  mean  of  s(t)  as  a  function  of  s(t-T)  . 
1^  s(t)  has  normal  distribution  (which  is  typically  the  case  because 

of  the  additive  form  of  the  process  2.50),  (2.53)  is  exact  and  r  is 

T 

the  correlation  coefficient  between  s(t)  and  s(t-T)  .  Note  that 
the  behavior  of  vn(x)  depends  only  on  the  behavior  of  r^  .  In 


particular,  and  In  constrast  to  the  vector  case,  there  Is  no  difference 
In  the  behavior  of  whether  information  is  perfect  or  not. 

If  we  multiply  both  sides  of  (2.50)  by  s(t-x)  ,  take  the 
expected  value,  and  divide  both  sides  by  the  variance  of  s(t)  ,  we  get 

r  ■  o,  r  +  o_  r  _+•••+  o  r  (2.54) 

T  1  ~-l  2  t-2  p  T-p 

Note  that  r^  satisfies  the  same  difference  equation  as  the  equation 
for  s(t)  with  e(t)  =0  •  rT  1®  therefore  the  same  as  the  homo¬ 
geneous  solution  of  s (t)  ,  and  the  value  of  information  is  a  constant 
times  the  square  of  this  homogeneous  solution.  The  solution  to  the 
difference  equation  (2.54),  can  be  written  as 

P 

rr  *  £  ct  Xi  (2.55) 

i-1 

where  c^  is  constant  and  ^  is  the  i*1*1  root  of  the  characteristic 
equation  of  (2.54)  (or  equivalently  the  eigenvalues  of  matrix  A  in 
(2.51))  .  In  the  following  example  we  will  study  the  autoregressive 
process  of  the  second  order  (AR(2))  in  more  detail 

Example  2.3.  Information  outdating  in  AR(2) :  for  AR(2)  we  have 

s (t)  -  s(t-l)  +  a2  s(t-2)  +  e (t)  (2.56) 

The  roots  of  the  characteristic  equation  of  this  system  are: 


To  have  a  stationary  process,  Xj^  and  X2  must  lie  inside  the  unit 
circle.  This  condition  gives  us  the  following  conditions  on  and 


+  a. 


-  a. 


<  a„ 


<  1 
<  1 

<  1 


(2.57) 


Points  (a^,  c^),  satisfying  conditions  (2.57),  lie  inside  the  triangle  of 

o 

Fig.  2.10.  In  the  shaded  region  of  the  triangle  +  4a2  <  0  ,  and 


k3 


the  roots  of  the  characteristic  equation  are  complex.  For  a  stationary 
process  the  following  observations  are  made  (proofs  are  simple  and  have 
been  omitted) : 


(1)  vn(T)(ai,a2)  *  vn(T)(-ai-e,2)  •  v  "  •  T 


that  is,  the  value  of  any  information  structure  n  with  any  delay  t  is 

the  same  for  two  processes  with  the  same  (Xj  but  the  two  being 

negative  of  each  other.  In  other  words,  V  .  .  is  symmetric  with 

tut) 

respect  to  the  axis  ^  in  Fig.  2.10.  Note  that  in  the  process  with 
>  0  ,  s(t)  changes  much  slower  than  does  s(t)  in  the  process  with 
<  0  (Fig. 2. 11),  but  the  value  of  information  changes  at  the  same 
rate  for  both  processes. 


(2)  (a1>a2)  -\(T')(ai>-<x2)  for  a2  >  0  ’  for  811  n 

and  T  ;  that  is,  for  two  processes  with  the  same  ,  and  with  the  two 

being  negative  of  each  other,  the  value  of  any  information 
structure  with  any  delay  will  be  greater  for  the  process  with  positive 
a2  . 

(3)  -  Vn(T+l )(“!., V  for  a11  n  and  T»  only  if 

(ai,0l2>  lie  in  tlie  shaded  re8ion  of  Fig«  2.12. 


Figure  2.12.  Information  perishing  region  in  AR(2) 

V  (1) 

(A)  For  the  shaded  region  of  Fig.  2.13,  p  =  -2 -  ,  is  always 

1  V 

n(2) 

(for  all  n  )  less  than  one,  that  is,  the  information  is  always  enhanced  if 
the  delay  increases  from  1  to  2.  We  can  also  show  that  —  1 

for  all  T  »  and  is  always  decreasing  for  large  T  . 
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Figure  2.13.  Information  enhancing  region  (from  t  =1  to 
T  =  2)  in  AR(2)  . 

The  result  that  the  value  of  information  with  two  units  of  delay  is 
always  more  than  the  value  of  information  with  one  unit  of  delay  is  easy 
to  interpret  in  this  case.  Note  that  in  the  shaded  region  of  Fig.  2.13, 
a2  >  ’  T^ere^ore»  fro*11  (2.56)  s(t)  is  determined  more  by  s(t-2)  than 

by  s(t-l),  and  thus  it  is  more  valuable  to  know  s(t-2)  than  s(t-l)  . 

(5)  In  the  shaded  region  of  Fig.  2.10  where  X^  and  X^  are 
complex,  rT  is  a  damped  oscillatory  function  of  T  .  ^  ,  therefore, 

has  the  shape  of  Fig.  2.14. 


V 

h(t) 


* 


X 

\ 

I  fvtTlTi-^rTr>-,-r  , 

w  T 

Figure  2.14.  V^tj  in  AR(2)  with  complex 
characteristic  roots. 
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It  is  easy  to  see  why  the  value  of  information  may  be  increasing 
with  delay  or  even  oscillate  in  autoregressive  systems.  In  the  equation 
for  autoregressive  systems, 


s(t)  *  ct.  s(t-l)  +  a„  s(t-2)  +  •••  +  a  s(t-p)  +  e(t) 
1  z  p 


it  is  intuitively  clear  that  s(t)  is  influenced  more  by  a  state  with  a 
larger  coefficient,  and  therefore  information  about  this  state  can  be  more 
valuable  than  the  information  about  a  more  recent  state.  We  did  not  have 


this  clear  interpretation  for  the  Markovian  system.  However,  since  we  can 
transform  a  Markovian  system  into  a  form  equivalent  to  an  autoregressive 
system,  the  results  for  the  Markovian  system  can  be  interpreted  in  the 
same  manner.  Consider  the  Markovian  system 


s(t)  *  A  s(t-l)  +  e(t) 


By  a  transformation  x(t)  ■  B  £(t)  where  B  is  an  invertable  matrix,  we 


x(t)  -  BAB-1  x(t-l)  +  B  e(t) 


The  matrix  B  can  be  chosen  such  that  A  ■  BAB  has  the  form 


(2.58) 


(companion  matrix)  : 


ft,  Aa  • • •  • • •  ft 

12  p 

1  o  .  0 

0  1  0  ...  0 


1  0 


1 


Comparing  with  Gq.  (2.51)  we  notice  that  the  new  system  represents  an 
autoregressive  system  of  order  p  (the  dimension  of  A  )  : 


Xl(t)  *  ®lXl(t_1)  +  a2*i(t-2)  + - +  SpX^t-p)  +  e(t)  (2.59) 


Our  information  in  the  new  system  is  about  the  vecotr  [x^(t),  x^t-1), 
x^(t-p+l)]  and  in  view  of  (2.59)  it  is  easy  to  see  why  the  value  of  informa 
tion  may  be  Increasing  with  delay. 


2.10  Comparative  Values  of  Delayed  Information  Structures 

In  this  section  we  will  explore  the  changes  in  the  relative  values 
of  information  structures  as  they  become  old.  Consider  the  following 
question:  Information  structure  n  is  more  valuable  than  n'  when 
both  n  and  o'  are  fresh.  Is  f|  more  valuable  than  o'  when  both 
have  a  delay  T  ?  It  is  clear  that  the  answer  to  this  question  may 
depend  not  only  on  n  and  n'  (and  T  )  ,  but  also  on  the  decision  for 
which  the  information  is  used,  as  well  as  on  the  properties  of  the  dynamic 
system.  We  can  think  of  cases  for  which  n  is  always  perishing  but  rj  * 
may  be  enhancing  in  some  interval, such  that  n' (t)  is  more  valuable 
than  n(t)  (See  Fig.  2.15).  If  however,  n  and  n’  are  such  that  the 


structures. 


superiority  of  n  over  n  '  is  not  limited  to  a  particular  decision, 
then  such  a  property  may  be  preserved  or  reduced  to  weaker  but  similar 
properties, as  ti  and  n’  are  delayed.  Some  comparison  of  informa¬ 
tion  structures  can  be  made  according  to  the  following  definitions  [  9 ]  : 

Definition  (1).  Z  f  Z'  :  set  Z  of  observation  z  (z*n(s))  is 
finer  than  set  Z'  of  observations  z  '(z '« n  '(s))  ,  Fig.  2.16  (n  and 
TI  ’  must  be  many-to-one  or  so  called  "noiseless"  mappings  ) . 


S  S 

Figure  2.16  Set  Z  finer  than  set  Z'  . 


As  shown  in  Fig.  2.16,  the  partition  of  the  set  S  (of  state 
variable  s  )  by  T)  is  finer  than  the  partition  by  n' 

Definition  (2).  Z  g  Z'  :  Z  is  "garbled"  into  Z'  .  Z  g  Z'  if  for 
all  z  e  Z  and  z'e  Z'  the  conditional  probability  of  z'  given  z  depends 
only  on  z  and  z '  In  particular 

{z  '|  Z,s,<?}  *  {z  'I  z,<?} 
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that  l's,  if  z  is  known,  the  distribution  of  z  '  does  not  depend  on  the 
state  s  Itself.  This  is  also  called  "cascaded  Information,"  (See  Fig. 
2.17) 


Figure  2.17  Cascaded  information. 

Definition  (3)»  n  ^  h'  :  n  is  "more  informative"  than  ri'  . 

T1  ^  ti*  if  every  potential  user  would  prefer  n  over  n  '  . 

It  can  be  shown  that  [ 9  ] : 

zfz'4  zgz'-4n^n'  (2.60) 

but  the  converse  relations  do  not  hold  in  general.  We  will  see  in  the 
following  that  some  of  the  relations  defined  in  Definitions  (1),  (2)  and 
(3)  are  preserved  or  reduced  to  weaker  relations  as  information  becomes  old. 

I.  Let  us  suppose  that  ZfZ'  .  Which  of  the  relations  (1), 

(2) ,  or  (3)  hold  for  delayed  information?  Let  us  denote  the  delayed 
information  by  y(t)  -  £(s(t))  .  We  have  y(t)  -  S(s(t))  -  n(s(t-T»- 
z(t-T)  . 

(1)  Y  g  Y '  does  not  hold  (in  general)  because  S  *nd  S’ 
are  not  noiseless  . 

(2)  Y  g  Y  '  holds  because 

{y’(t)|y(t),s(t)<f  }  -  {z»(t-T)|z(t-T),s(t),*> 
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but  by  (2 .60 )  ZgZ  holds  and  we  have : 

(z '(t-r)\z(t-T)  ,s(t)  ,&}  -  {z'(t-T)|z(t-T),<f} 

Therefore 

{y’(t)|y(t),s(t),<?}  - 

=  (y  '(t)  |y(t)  ,$) 

(3)  £]^  £  holds  by  (2.60)  because  YgY1  holds. 

II.  Let  ZgZ1  ,  then  for  delayed  information  we  have 

(1)  Y  f  Y  '  clearly  does  not  hold  in  general. 

(2)  YgY'  holds  by  the  same  argument  as  in  I- (2),  above. 

(3)  £  ^  £  1  holds  because  Yg  Y  '  holds. 

III.  Let  0  ^0*  »  then  for  delayed  information: 

(1)  Y  f  Y  '  clearly  does  not  hold  in  general. 

(2)  YgY'  clearly  does  not  hold  in  general. 

(3)  £  >  £  '  does  not  hold  in  general,  but  it  holds  for  Markovian 

systems.  The  following  example  shows  that  £  £'  does  not  hold  in  general. 

Let 

z(t)  ■  ri(s(t))  *  s(t)  be  perfect  information 

z '(t)  -n'(s(t))  -  s(t-l)  be  perfect,  but  one  unit 

delayed  information 


Clearly  n  H  1  ,  but  if  both  n  and  n  '  have  one  unit  delay  we  have 


y(t)  -  S(s(t))  -  n(s(t-l))  -  s(t-l) 

y’(t)  -  C’(s(t))  -  n  ’(s(t-l))  -  s(t-2) 

and  C  is  not,  in  general,  more  informative  than  K  '  •  For  example, 

in  the  autoregressive  system  of  the  second  order  we  showed  that  (Sec.  2.9)  if 
a2  >  lail  »  bben  knowing  s(t-2)  is  more  valuable  than  knowing  s(t-l)  . 
To  show  that  for  Markovian  systems, let  v(s(t),d)  be  a  payoff 

function,  and  the  Markovian  system  be  represented  by 

s(t)  -  f (s(t-l)) 

where  f  is,  in  general,  a  one-to-many  mapping  and  can  be  assumed, 
without  loss  of  generality,  to  remain  the  same  over  time.  We  then  have 

s(t-l)  -  f(s(t-2)) 

s(t-2)  -  f(s(t-3)) 


Therefore  , 


s(t)  -  f(T)(s(t-T)) 


where  by  f  ^  we  mean  T  times  applying  of  f  .  The  payoff  at  time 
t  is 


_ 
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v(s(t),d)  *  v(f'‘  ;(s(t~x)),d) 


Now  let 


w(s(t-x),d)  =  (v(f  (T,(s(t-x)),d)|s(t-x),< 


Then  we  have 


(v(s(t),d) |y(t)  =  h(s(t-T))^f> 


/(  v(s(t)  ,d)  |n(s(t-T)),s(t-T)|  ,<?  )  *  (s(t-x)  |n(s(t-T))  ,<£} 


's(t-T) 


J  w(s(t-X)  ,d){s(t-T)  [n(s(t-T )),«?} 


s(t-x) 


-  <  w(s(t-x),d)|n(s(t-T)),«p> 


max  <  v(s(t) ,d) |n(8(t-x))^  ) 
d 


■  max  (  w(s(t-x)  ,d)  jn(s(t-x)),<s  )  (2.61) 

d 


and,  similarly. 


max  (  v(s(t),d)  |t)' (s(t-x)),<?)  ■  max  (  w(s(t-x)  ,d)  |n’  (s(t-x)  )  >  (2.62) 

d  d 


n  is  more  valuable 


but  since  ti  is  more  informative  than  n  *  , 
than  ri '  for  any  payoff  function  and  in  particular  for  w(s,d)  .  It 
follows  that  the  right-hand  side  of  (2.61)  is  greater  than  the  right-hand 
side  of  (2.62);  therefore, the  left-hand  side  of  (2.61)  is  greater  than 
the  left-hand  side  of  (2.62): 

max  (v(s(t),d)|C(s(t)),«?)  >.  max  (  v(s(t)  ,d)  |  V  (s(t))  ,«? ) 

d  d 

aid  since  v(s,d)  is  an  arbitrary  function,  it  follows  that  £  ^ 

Our  work  in  this  section  has  been  more  of  an  exploratory  type.  We 

showed  that  the  relative  advantage  of  one  information  structure  over 

another  is  not,  in  general,  preserved  when  both  information  structures 

are  delayed.  We  found,  however,  that  such  a  relation  would  be  preserved 

(or  somewhat  weakened),  if  the  advantage  of  one  information  structure 

over  the  other  is  not  limited  to  a  particular  payoff  function. 

2.11  Summary 

The  process  of  outdating  of  information  in  a  dynamic  environment 
has  been  investigated  in  this  chapter.  We  have  shown  how  this  process 
depends  on  the  various  factors  which  influence  it,  namely  the  dynamics 
of  the  environment,  the  decision  for  which  the  information  is  used,  and 
the  properties  of  the  information  structure  itself  (perfect  or  imperfect) . 
Assuming  a  quadratic  payoff  function,  the  value  of  an  information 
structure  was  calculated  as  a  function  of  its  age.  The  result  was  written 
in  a  form  which  shows  separately  the  effect  of  each  factor  on  the  infor¬ 
mation  outdating  process: 
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(2.63) 


Vn(T)  -  tr  M  •  R(t)  £  *  R'(T) 

L  s 

- o 

Vn(t)  ’  t*'e  va^ue  t^ie  information  structure  n  with  delay  T  , 
is  the  trace  of  the  product  of  four  matrices,  which  represent  the 
factors  influencing  the  outdating  of  information.  The  matrix  M 
represents  the  effect  of  the  payoff  function  on  the  value  of  informa¬ 
tion  (M  is  determined  by  the  coefficient  matrices  of  the  payoff  func¬ 
tion).  The  information  structure  influences  the  value  of  information 

through  matrix  E~  of  the  covariance  of  the  posterior  mean  of  the 
-o 

state,  given  fresh  information.  Matrices  R(t)  and  R' (x)  (trans¬ 
pose  of  R(t))  represent  the  dynamics  of  the  state  (environment).  R(t) 
is  the  coefficient  of  the  linear  approximation  of  the  mean  of  the 
state  at  time  t  ,  as  a  function  of  the  state  at  time  t  -  T  .  If  the 
state  has  a  normal  distribution,  R(t)  is  the  "multiple  correlation" 
between  s(t)  and  £(t  -  t)  .  Since  we  assume  a  stationary  state, 

R(t)  is  independent  of  t 

As  we  can  see  from  Eq.  (2.63),  the  main  determinants  of  the  dynamics 
of  the  information  are  the  dynamics  of  the  state,  as  represented  by  the 
matrix  R(t)  .  Nevertheless,  the  payoff  function  and  the  information 
structure  itself  can  drastically  Influence  the  dynamics  of  information. 
Specific  results  have  been  obtained  for  the  linear  Markovian  system, 

8(t)  *  A  •  s(t-l)  +  e(t) 
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where  A  is  a  constant  matrix  and  e(t)  is  a  random  noise.  We  have 


shown  that  the  eigenvalues  of  matrix  A  play  an  important  role  in 
determining  the  pattern  of  the  information  outdating  process.  It  was 
found  that  the  value  of  information  may  increase  with  delay  or  may  even 
oscillate.  This  is  a  rather  counter-intuitive  result,  especially  for  a 
Markovian  system.  We  have  shown,  however,  that  if  the  information  is 
perfect,  its  value  will  always  decrease  with  delay,  regardless  of  the 
dynamics  of  the  state  or  the  parameters  of  the  payoff  function.  We  have 
found  other  conditions  under  which  the  information  is  always  perishing. 
For  these  cases,  bounds  are  found  for  the  rate  of  information  perishing. 
These  bounds  are  determined  by  the  smallest  and  the  largest  eigenvalues 
of  matrix  A  .  The  results  for  the  Markovian  system  are  summarized 
in  Table  2.1. 

Table  2.1  Dynamics  of  Information  in  a  Linear  Markovian  System 


Properties  of  the  state, 
the  information,  and  the 

Dynamics  of  the  Information 

decision 

H  Perfect 

Information  always  perishing 

A,  M,  E~  >  0 
’  s  — 

—o 

Information  always  perishing; 

P(t)  >  ~ ■  (a 

1  ’  (a' 

A  and  M  diagonal 

Information  always  perishing; 

72ico>>-Y 
*1  h 

A  and  E~  diagonal 
-So 

Information  always  perishing; 

i  l  W 

2  >  P(T)>  , 

X1  N 

A  has  complex 
eigenvalues 

The  value  of  the  Information  may  oscillate  with 
delay. 

and  A  are  the  smallest  and  the  largest  eigenvalues  of  matrix  A 


We  have  also  Investigated  the  information  outdating  process  for 
the  autoregressive  systems.  Although  an  autoregressive  system  can  be 
represented  by  a  Markovian  system  of  a  higher  dimension,  the  study  was 
useful  in  gaining  insight  into  the  process  of  information  outdating. 

In  particular,  it  helped  to  show  more  clearly  why  the  value  of  the 
information  may  be  enhanced  or  may  oscillate  with  tine. 

Finally,  we  have  made  some  comparisons  between  delayed  information 
structures.  We  have  investigated  whether  various  relations  between  two 
information  structures  (regarding  the  superiority  of  one  over  another) 
are  preserved,  when  both  information  structures  are  delayed.  The 
relations  studied  were  the  following:  (1)  ZfZ'  :  set  Z  of  observa¬ 
tion  n  is  finer  than  set  Z'  of  observation  n*  ;  (2)  ZgZ'  : 
set  Z  of  observation  n  is  "garbled"  into  set  Z'  of  observation 
n'  >  (3)  r|  ^  n'  :  observation  n  is  "more  informative"  than 

observation  n'  .  The  first  relation  is  the  strongest  (with  regards  to 
r)  being  superior  to  n'  )»  and  the  third  relation  is  the  weakest.  When 
the  information  structures  f|  and  n'  are  delayed  some  of  the  relations 
are  preserved,  and  some  are  reduced  to  weaker  relations.  The  results 
are  summarized  in  Table  2.2. 
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Table  2.2.  Comparison  Between  Delayed  Information  Structures. 


Fresh  Information 


Delayed  Information 


z  f  z' 

Z  f 

K 

does  not  hold 

0  0 

T 

T 

ZT  8 

ZT 

holds 

holds 

Zn  8  K 

O  0 

ZT  8  ZT 

holds 

holds 

CHAPTER  3 


RANDOMLY  OCCURRING  DECISIONS  AND  INFORMATION  RECOVERY 

In  Chapter  2  the  process  of  outdating  of  information  was  studied. 
There  are  many  decision  situations  where  we  must  deal  with  old  (unfresh) 
information.  In  such  cases,  obtaining  fresh  information  at  the  time 
of  the  decision  may  be  very  time  consuming  or  extremely  expensive. 

One  of  the  most  important  cases,  where  the  use  of  unfresh  information 
may  be  inevitable,  is  when  the  time  of  the  decision  is  uncertain.  The 
decision  may  or  may  not  have  to  be  made  at  each  point  of  time.  Moreover, 
when  the  time  of  the  decision  comes,  it  must  be  made  within  a  short 
period  of  time,  thus  making  it  impossible  to  obtain  fresh  information 
for  the  decision.  We  may  think  of  this  type  of  decision  as  one  which 
must  be  made  upon  the  occurrence  of  a  precipitating  event  with  random 
time.  Often  the  decision  maker  has  no  control  over  this  event.  Follow¬ 
ing  Grum  [  5 ]  we  use  the  term  "contingent"  for  this  type  of  decision. 

The  precipitating  event  in  a  contingent  decision  is  often  an  action  by 
another  party.  For  example,  firms  have  to  make  contingent  decisions  in 
response  to  either  government  actions  (e.g.  regulations,  bidding  con¬ 
tracts,  etc.),  or  actions  by  their  competitors  (e.g.  changing  prices, 
introducing  new  products,  etc.).  The  government  is  faced  with  contingent 
decisions,  often  as  a  result  of  economic  or  political  decisions  by  foreign 
countries  (e.g.  outbreak  of  wars,  economic  embargos,  etc.).  Since  we 
are  dealing  with  unfresh  information  in  making  such  decisions,  and  in 
view  of  the  information  being  outdated  in  time,  it  may  be  desirable  to 
be  prepared  for  the  decision  by  regular  recovery  (updating)  of  information. 


59 


In  this  chapter  two  approaches  to  the  recovery  of  information  are 
introduced.  The  optimal  recovery  of  information  according  to  these 
approaches  is  studied  in  Chapters  4  and  5. 

3. 1  The  Contingency  Decision  Model 

As  mentioned  before, a  contingent  decision  may  be  thought  of  as  a 
decision  which  must  be  made  upon  the  occurrence  of  a  precipitating  event 
with  random  time.  We  denote  this  event  by  E  .  E  may  be  a 
single  event,  a  combination  of  simultaneous  events,  or  the  last  of  a 
chain  of  events.  In  any  case,  a  single  event  E  can  represent  all  these 
situations.  In  Chapters  4  and  5  we  study  the  one-time  decision  case, 
namely  the  case  where  the  decision  happens  only  once.  In  Chapter  6  the 
results  are  extended  to  the  case  where  the  decision  may  be  repeated  in 
time.  For  the  one-time  decision  case  the  occurrence  of  the  decision  may 
be  modeled  as  shown  in  Fig.  3.1.  g^  is  the  probability  that  event  E 
occurs  at  time  t  (given  that  it  did  not  occur  before).  Of  course,  gfc 
may  be  revised  as  time  passes.  E  occurs  only  once. 


Figure  3.1  Occurrence  model  for  a  contingent 
decision. 


Information  Recovery:  A  priori  vs.  A  posteriori  Policies 


As  mentioned  earlier,  for  contingent  decisions  we  may  find  it 
attractive  to  update  our  information  regularly  in  order  to  be  prepared 
for  the  decision.  Two  types  of  policies  for  information  recovery 
(updating)  are  studied  here. 

1.  A  priori  policies  for  information  recovery:  In  this  type  of 
policy  we  use  our  prior  knowledge  about  the  outdating  of  information 
to  decide  the  times  of  all  future  observations  (recoveries) .  In  particu¬ 
lar,  if  g^  (the  probability  of  the  decision  occurring  at  time  t  given 
that  it  did  not  occur  before)  is  constant  and  our  system  is  stationary, 
the  information  recovery  for  the  infinite-horizon  case  will  be  periodic, 
as  shown  in  Fig.  3.2.  In  this  figure  the  information  is  recovered 

vx(t)  - 

<v1(t,^>  1 1,^)  t 


I - - - * - > 

T  2T  t 

Figure  3.2  Apriori  Recovery  of  Information. 

at  times  0,T,2T,....  V^(t)  is  the  a  priori  expected  payoff  if  the  decision 

occurs  at  time  t  •  Note  that  the  actual  expected  payoff  at  time  t  depends 


J 
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on  the  information  available  at  time  t  ,  namely  the  result  of  the 
observations  before  t  .  Denoting  the  sequence  of  all  the  observations 
before  t  by  Z  ,  we  have 

vx(t)  -  (v1(t,zt)|t,«?  ) 

where  V^(t,Zt)  is  the  expected  payoff  at  time  t  ,  given  Zfc  .  Me 
can  assume,  without  loss  of  generality,  that  the  expected  payoff  with 
no  information  is  zero.  Thus,.  V^(t)  will  be  the  a  priori  value  of 
the  past  information  at  time  t  .  Therefore,  this  type  of  policy 
concerns  the  a  priori  outdating  of  information  and  appropriate  informa¬ 
tion  recovery  schedules. 

2.  A  posteriori  policies  for  information  recovery:  In  this  type 
of  policy  we  use  not  only  our  prior  information  about  the  system,  but  also 
the  result  of  the  previous  observations  to  determine  the  time  of  the  next 
recovery.  Therefore,  the  time  of  each  recovery  may  depend  on  the  realiza¬ 
tion  of  the  previous  observations.  One  may  ask,  however:  Why  should  the 
time  of  the  next  observation  depend  on  the  realization  of  the  last  obser¬ 
vations  (particularly  since  we  have  no  intention  of  controlling  or  modify¬ 
ing  the  uncertain  state;  rather,  all  we  want  is  to  know  the  state  and 
then  set  our  decision  accordingly)?  The  following  example  illustrates 
this  matter.  Consider  a  target  in  a  field  at  which  we  like  to  shoot  if  a 
random  event  E  occurs  (assume  E  to  be  independent  of  the  position  or 
any  other  property  of  the  target).  The  target  is  constantly  moving  in 
an  uncertain  manner  and  we  cannot  see  its  location.  We  can  find  out 


about  its  location  at  any  time,  however,  at  some  cost.  When  E  occurs, 
we  have  to  shoot  at  the  target  immediately  (using  only  our  previously 
obtained  information  about  its  location) ,  and  our  payoff  depends  on  how 
closely  we  hit  it .  If  all  that  matters  is  how  closely  we  hit  the 
target  regardless  of  where  the  target  is  in  the  field,  and  if  the  motion 
of  the  target  is  "independent"  of  its  position,  then  there  is  no  reason 
why  the  time  of  the  next  observation  of  the  location  of  the  target 
should  depend  on  its  location  in  the  previous  observations.  All  we  want 
is  to  know  where  the  target  is,  but  its  position  per  se  is  of  no  sig¬ 
nificance.  However,  if  the  field  is  not  "homogeneous"  in  the  sense  that 
different  regions  in  the  field  have  different  degrees  of  Importance, 
then  the  time  of  the  next  observation  may  well  depend  on  the  locations 
observed  previously.  Suppose,  for  examole,  that  one  region 
of  the  field  is  of  particular  Importance  and  therefore  there  is  a  very 
high  payoff  for  hitting  the  target  in  that  region.  Then  if  the  last 
observation  shows  that  the  target  is  in  this  region  we  may  want  to 
observe  its  location  more  frequently.  We  will  elaborate  more  on  this 
in  Chapter  5,  where  we  find  some  conditions  which  make  the  time  of  the 
next  recovery  Independent  of  the  realization  of  the  last  observations. 

The  a  posteriori  information  recovery  problem  may  be  thought  of 
as  deciding  at  each  period  whether  or  not  to  buy  information  at  that 
period  (given  the  state  of  information  at  that  period) .  This  is  shown 
in  Fig.  3.3.  At  each  time  t  (recall  that  t  is  discrete,  t  denotes 
slightly  before  t)  we  must  decide  whether  or  not  to  buy  new  Information, 
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Figure  3.3  Decision  model  for  the  a  posteriori  recovery  of  information. 


given  our  state  of  information  at  t”  (y(t-))  .  If  we  buy  information 
we  will  learn  s(t)  (assuming  perfect  information)  with  the  distribution 
{s(t)|y(t  ),<?}.  If  the  event  E  happens  at  t  ,  then  we  will  set 
our  decision  d^  according  to  the  new  Information  s(t) ,  and  the  net 
payoff  will  be  ( v^(s(t) ,d^(s(t))| y(t  ),<?)  -  c  ,  where  d^(s(t))  denotes 
the  optimal  decision  given  s(t)  ,  and  c  is  the  cost  of  information. 


If  E  does  not  happen, we  will  proceed  to  time  t+1  where  we  are 
faced  with  the  same  decision,  but  with  a  new  state  of  information 
updated  at  time  t  .  If  we  do  not  acquire  information  at  t  , 

then  if  E  happens  we  have  to  set  our  decision  d ^  according  to  our  pre- 

A 

vious  information  y(t)  and  the  payoff  will  be  (v^(s(t)  ^(yCt))  | y (t)  ,<?) 
If  E  does  not  happen,  we  proceed  to  time  t+1  where  we  are  faced  with 
the  same  decision  again,  but  our  information  is  one  unit  older. 

Note  that  since  no  new  information  is  obtained  during  the  inter¬ 
vals  between  observations,  we  may  try  to  calculate  the  next  recovery 
time  immediately  after  each  observation.  Therefore,  we  may  also  think 
of  our  information  recovery  problem  as  deciding  the  next  recovery  time 
after  observing  the  state  at  each  recovery. 

For  a  Markovian  state  with  perfect  observations,  the  result  of 
the  last  observation  of  the  state  is  all  the  information  needed  for 
deciding  the  next  recovery  time.  Consequently,  the  calculations  of  the 
optimal  policy  will  be  greatly  reduced. 

The  a  posteriori  policies  clearly  have  a  higher  expected  payoff 
compared  to  the  a  priori  policies.  They  are  more  difficult  to  calculate, 
and  perhaps  more  costly  to  implement,  however. 

In  Chapter  4  we  will  Investigate  the  optimal  a  priori  policies 
for  information  recovery.  The  a  posteriori  policies  are  studied  in 
Chapter  5. 


CHAPTER  4 


A  PRIORI  OPTIMAL  INFORMATION  RECOVERY 

In  this  chapter  we  will  study  the  a  priori  optimal  information 
recovery  policies  for  contingent  decisions.  The  optimality  conditions 
are  found  for  finite  and  infinite  horizon  cases.  The  effect  of  risk 
aversion  on  the  optimal  information  recovery  policies  is  also  investi¬ 
gated  . 

4.1  Optimal  Information  Recovery  for  the  Infinite-Horizon  Case 

Most  of  our  results  concern  the  infinite  horizon  case.  Assuming 
that  s(t)  is  stationary  and  gf  (the  probability  that  the  precipitating 
event  E  occurs  at  time  t  ,  given  that  it  did  not  occur  before)  is 
constant,  the  optimal  information  recovery  is  periodic.  We  denote  the 
recovery  period  by  T  .  V^(t)  was  defined  in  Chapter  3  as  the  a  priori 
expected  payoff  of  the  decision  given  that  it  occurs  at  time  t  .  We 
have 


Vx(t)  -  (V1(t,Zt)|t,T,«f>  (4.1) 

where  V^t.Z^  is  tlle  posterior  expected  payoff  if  the  decision  occurs 
at  t  (Zfc  is  the  sequence  of  all  the  observations  before  t  ) . 

Since  V^(t)  is  periodic  (Fig.  4.1),  it  is  easier  to  measure  t  from  a 
recovery  time.  Therefore  we  will  consider  V^(t)  as  the  a  priori 
expected  payoff  t  time  units  after  a  recovery. 
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Vjft)  - 
^(t.Z^t.T,*) 


T  2T  t 

Figure  4.1  Information  recovery  for  the  infinite-horizon  case. 

In  a  contingent  decision  the  time  of  the  decision  is  uncertain. 

We  define  V(T)  as  the  net  expected  payoff  given  only  that  the  recovery 
period  is  T  (but  with  the  time  of  the  decision  uncertain).  We  have 

V(T)  -  (V(t,Zt)|T,<?>  (4.2) 

where  V(t,Z^)  is  the  net  expected  payoff  if  the  decision  occurs  at  time 
t,  and  Z£  was  observed  in  the  previous  observations.  Notice  that  since 
net  payoff  it  takes  into  account  the  cost  of  information 
as  well  as  the  payoff  of  the  decision. 

Theorem  4.1.  If 

(1)  Information  is  always  perishing  (that  is  Vj(t)  is  a  decreas¬ 
ing  function  of  t)  ,  and 
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(2)  There  exists  at  least  one  T  ,  such  that  the  net  expected 
payoff  with  recovery  period  T  (V(T))  is  positive.  Then,  there  exists 
a  unique  optimum  recovery  period  T*  (See  Fig.  4.2). 


Figure  4.2  Expected  payoff  as  a  function  of  the  recovery  period. 


Proof :  Since  the  horizon  Is  infinite,  the  expected  payoff  (benefit) 
with  recovery  period  T  ,  B(T)  ,  can  be  written  as 

T-l 

BOD  -  E  Pt*  Vx(t)  +  Pt>T  •  B(T)  (4.3) 

t-0 

where  p£  ■  g(l-g)t  is  the  probability  (at  time  zero)  that  the  decision 
will  occur  at  time  t  ,  and  Pt>T  is  the  probability  that  the  decision  will 
not  occur  before  T  .  The  first  term  on  the  right-hand  side  of  (4.3) 
is  the  expected  payoff  in  the  first  period  and  the  second  term  is  the 
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expected  payoff  after  the  first  period.  Note  that  we  have  not  dis¬ 
counted  future  payoffs.  This  would  only  be  done  for  simplicity,  and 
there  is  no  difficulty  in  usinp  the  discounted  values.  From  (4.3)  we 
have 


Pt*  Mt>  £  gd-g)  •  v  (t) 

t=0  C  1  t=0  1 


1  -  p. 


1  -  (1-g) 


(4.4) 


If  the  cost  of  each  recovery  of  information  is  c  ,  the  expected  cost 
of  information  is 


C(T)  -  c  +  c(l-g)T  +  c(l-g)2T  +  ... 
c 

1  -  (l-g)T 


(4.5) 


The  net  expected  payoff  is  therefore 


V(T)  -  B(T)  -  C(T)  - 


T-l  ^ 

£  gd-g)1  v  (t)  -  c 

t-0 _ _ _ 

1  "  (l-g)T 


(4.6) 


Now  we  can  show  that 

(1)  If  V(T  )  >  V(T  +1)  ,  then  V(T  +1)  >  V(T  +2)  ,  and 

o  o  o  o 

(2)  If  V(T  )  <  V(T  +1)  ,  then  V(T  -1)  <  V(T  )  . 

o  o  o  o 

The  first  statement  implies  that  if  V(T)  is  decreasing  from  T  to 

o 

Tq+1  ,  it  will  be  decreasing  for  all  T  >  Tq  .  The  second  statement 


Implies  that  if  V(T)  is  increasing  from  T  to  T  +1  ,  it  is 

o  o 

increasing  for  all  T  <  Tq  .  From  (1)  and  (2)  it  follows  that  if 
V(T)  has  a  maximum.it  will  be  unique.  To  prove  (1)  and  (2).  let  us 
find  the  expression  for  V(T)  -  V(T+1)  .  Letting  V^t)  *  a  •  ^(0) 
we  have 

g(l-g)T 

V(T)  -  V (T+l)  -  - - - —  •  G(T)  (4.7) 

[1  -  (1-g)  ][1  -  (1-g)  L] 

where 

G(T)  -  gVx  (0)  |l  +  o^d-g)  +  a2(l-g)2  +  ...  +  a^Cl-g)1"1 

-  V^ofl  -  (l-g)T]  -  c  (4.8) 

Since  the  term  g(l-g)T/{[l  -  (l-g)T][l  -  (l-g)T+^]}  is  positive,  it 
is  sufficient  to  show  that 

(1)  G(T)  >  0  — ^  G(T+1)  >  0 

(2)  G(T)  <  0  — *  G(T-l)  <  0 
G(T+1)  can  be  written  as 

G(T+1)  -  G(T)  +  g  V1(0)(dT2-  0^)  [l  -  (l-g)T+1J 

but  by  the  information  perishing  assumption  aT  >  aT+^  ,  and  the  second 
term  on  the  right-hand  side  is  positive;  therefore  * 

G(T)  >  0  — ^  G(T+1)  >  0 

-  TD  - 


Similarly,  we  have 

G(T-l)  -  G(T)  +  g  V1(0)(aT2-aT2_1)[l  -  (l-g)T] 

and  since  “x  <  aT  1  *  t*ie  seconc*  term  on  the  right-hand  side  is  negative, 
and  we  have 

G(T)  <  0  G(T-l)  <  0  . 

Now  note  that  G(T)  can  be  written  as 

G(T)  =  (1  -  (l-g)T)[V(T)  -  a^CO)]  (4.9 

as  T  -►  00  ,  a^V^(O)  -*■  0  and  G(T)  will  have  the  same  sign  as  V(T) 
Therefore,  if  V(°°)  >  0  ,  then  G(°°)  >  0  and  V(T)  is  decreasing  for 
large  T  .  Similarly  V(“)  <_  0  implies  V(T)  increasing  for  large  T  . 
From  this  observation  and  from  (1)  and  (2)  it  follows  that  V(T)  has 
one  of  the  forms  of  Fig.  4.3  (noting  that  V(0)  =  -  00 )  .  This  completes 
the  oroof  of  the  theorem. 


Theorem  4.2.  The  necessary  and  sufficient  condition  for  the  optimal 

* 

information  recovery  period  (T  )  is  that  the  residual  value  of  informa¬ 
tion  immediately  before  information  recovery  (V^(T*))  be  equal  to  the 
expected  net  payoff  of  the  policy  V(T*)  . 

Proof;  We  showed  in  Theorem  4.1  that  G(T)  ,  as  defined  in  (4.8), 
is  negative  if  V(T)  is  increasing  and  is  positive  if  V(T)  is  decreas¬ 
ing.  From  Fig.  4.3  it  follows  that  G(T)  has  one  of  the  forms  shown  in 


Figure  4.4  G(T)  for  V(»)  >  0  and  V(°°)  <_  0 

Fig.  4.4.  If  V(°°)  >  0  ,  then  T  is  unique  and  finite  and  we  must  have: 

G(T*)  -  0  (4.10) 

This  is  the  necessary  and  sufficient  condition  for  T*  because  of  the 
uniqueness  of  T  .  If  V(«)  <  0  ,  there  is  no  optimum  and  G(T) 


■ 


is  always  negative.  For  V(°°)  =  0  ,  G(T)  is  zero  at  T  =  00  .  T  =  00 

may  be  considered  as  optimum  although  the  payoff  is  the  same  as  buying 

no  information  at  all.  Therefore  (4.10)  is  always  a  necessary  and  suffi- 

* 

cient  condition  for  the  optimal  information  recovery  period  T  .  From 
(4.9)  and  (4.10)  we  have 


V(T*)  =  a  •  V.(0)  =  V- (T*) 

T  1  1 


(4.11) 


Therefore, 


T* optimum  4=^  V(T*)  =  V.  (T*) 


(4.12) 


Since  we  have  normalized  our  payoff  function  such  that  the  expected 
payoff  with  no  information  is  zero,  V^(t)  is  in  fact  the  value  of  our 
previously  obtained  information  at  time  t  >  and  we  often  refer  to  it  as 
the  residual  value  of  information  and  show  it  by  R(t)  .  Therefore  (4.12) 
may  be  written  as 


V(T*)  =  R(T*) 


(4.13) 


According  to  Theorem  4.2  the  net  expected  payoff  of  a  contingency 
decision  is  at  most  equal  to  the  (gross)  expected  payoff  if  the  decision 


Note  that  T  which  satisfies  Eq.  (4.11)  is  not,  in  general,  an 
•ue*er.  The  time  in  our  system  is  discrete,  however.  Since  G(T)  *  0 
■•t  lies  V(T)  %V(T+1)  ,  it  follows  that  the  actual  optimum  T  (integer) 
..  between  T  (from  Eq.  (4.11))  and  T*+  1  .  Therefore  T*  found  in 
.  ; i i  wist  be  rounded  up  to  find  the  actual  optimum  T  ,  but  we  often 
discrepancy  and  regard  T*  as  optimum. 
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occurs  when  our  information  is  at  its  lowest  point  (Fig.  4.5).  The  extra 
payoff  of  the  decision  if  it  occurs  at  other  points  of  time  will  just  be 
enough  to  compensate  for  the  cost  of  information.  We  can  also  make  the 


following  remarks  from  Theorem  4.2. 

Remark  1.  The  net  expected  payoff  of  the  contingency  decision  with  an 

optimal  recovery  period  (T*)  is  the  same  as  the  expected  payoff  in  a 

* 

normal  decision  (known  time)  with  information  having  a  delay  T 

This  is  Immediate  from  (4.12)  since  V^(t)  is  the  expected  payoff 
with  the  information  having  a  delay  t 

Remark  2.  If  the  optimal  information  recovery  period  T  Increases 
(as  a  result,  for  instance,  of  an  increase  in  the  cost  of  information), 
then  the  net  expected  payoff  (with  optimum  recovery  period)  decreases  at 
the  same  rate  as  the  rate  of  information  perishing  of  the  system.  This 
is  again  immediate  from  (4.12). 
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Remark  3.  If  there  is  any  information  recovery  it  will  be  before 
the  past  information  is  completely  perished. 

From  (4.12),  V  (T*)  >0  if  and  only  if  V^T*)  >  0  ,  that  is, 

if  the  information  is  not  completely  perished  at  recovery  times. 


Necessary  Condition  for  Optimality  in  General 


In  the  following  we  obtain  a  necessary  condition  for  optimal 


information  recovery  in  the  general  case,  namely  when  gt  is  not  constant 
in  time  and  information  may  be  enhanced  during  some  intervals.  The 


horizon  may  be  finite  or  infinite. 

-  *  *  « 

Suppose  S  =  lOjt^  , t 2  ,.../  is  an  optimal  information  recovery 
schedule  (t^  being  the  time  of  the  ith  recovery)  .  If  the  time  of  one 
of  the  recoveries  is  changed  from  its  optimum  value,  the  net  expected  pay¬ 


off  decreases  (or  remains  unchanged) ,  no  matter  how  the  timing  of  the 


other  recoveries  are  modified.  Let  us  define 


Vt(t^)  =  maximum  net  expected  payoff  from  time 
t  on  with  the  next  recovery  at 
t^(t^;L  t)  ,  given  the  previous 

recoveries  schedule.  (4.14) 


£  jc  "ft 

V  (ti)  reaches  its  maximum  at  t^  =  t^  for  all  t  e  ^.t^]  (See  Fig- 

4.6).  This  is  true  because  our  state  of  information  does  not  change  in 
.  *  *. 

the  interval  (t^_^,t^)  •  Now  note  from  Fig.  4.6  that  the  equation 


v/ti)  -  vt(ti+i) 


(4.15) 
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Figure  4.6  Optimum  information  recovery  time  t* 


A  A  ★  A  ft 

can  hold  only  for  some  t^  ,  such  that  t^  <  t^  <  t^  +  1  .  Since  t^ 
must  be  an  integer  (because  time  is  discrete  in  our  system),  t^*  is 

A 

in  fact  equal  to  t^  when  the  latter  is  rounded  up.  Therefore  (4.15) 

ft 

can  be  regarded  as  the  equation  for  optimum  t_^  ,  keeping  in  mind  that 

.  * 

the  t^  obtained  from  this  equation  must  be  rounded  up.  Therefore 
we  write 


V*1  +  « 


Letting 


* 

t  -  t 

i 


V‘i +  » 


can  be  written  as 


(4.16) 


(t*  +  1) 


(4.17) 


R(t)  Is  the  residual  value  of  Information  at  time  t  *  therefore,  the 
first  term  on  the  right-hand  side  of  (4.17)  is  the  expected  payoff  if  the 
decision  occurs  at  t *  .  The  second  term  is  the  expected  payoff  from 
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time  +  1  on.  From  (4.16)  and  (4.17)  we  have 


V  *  U  > 

Ci 


g  *  •  R(t  * )  +  (1-g  *)•  V  *  (t*+l)  (4.18) 

*1  Ci  Ci+1 


Simplifying  our  notation  by  letting  V(t)  ■  V^(t)  ,  the  optimality  con¬ 
dition  (4.18)  is  written  as: 


V(t*)  =  g  *  *  R(t*)  +  (1-g  *)  •  V(t*+1)  (4.19) 

t  t 


Recall  that 

V(t)  =  maximum  expected  payoff  from  t  on  with  recovery  at  t 


and 


R(t)  *  residual  value  of  information  at  t 


Notice  also  that  for  the  special  case  of  the  previous  section,  namely 

—  *  * 

when  gt  is  constant  and  the  horizon  is  infinite,  V(t  )  =  V(t  +  1)  , 
and  (4.19)  reduces  to 


V(t*)  =  R(t*) 

which  is  the  condition  we  had  obtained  for  that  case. 

4.3  Interpretation  of  the  Optimality  Condition 
V(t)  can  be  written  as 

V(t)  -  gt  •  (t)  +  (l-gt)lV6(t+l|t)  +  c]  -  c  (4.20) 
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where  Vj(t)  is  the  expected  payoff  if  the  decision  occurs  at  t  with 
information  being  recovered  at  t  ,  and  V^(t+l|t)  is  the  net  expected 
payoff  from  t+1  on,  given  that  information  is  recovered  at  t  (and 
the  decision  did  not  occur  at  t  )  .  c  is  the  cost  of  information 
recovery.  Substituting  for  V(t  )  from  (4.19)  into  (4.20)  we  obtain 

8t*  -*(t*)-c]  -  U-gt*)  *  [V(t*+1)  -  V6(t*+l|t*)]  (4.21) 

The  left-hand  side  of  (4.21)  can  be  thought  of  as  the  net  expected  loss 

•ff  it  it 

at  t  of  buying  information  slightly  later  (t  +1  rather  than  t  ) 

The  right-hand  side  of  (4.21)  is  the  net  expected  benefit  from  t  +1 

*  * 

on,  of  buying  information  at  t  +1  rather  than  at  t  .  Therefore, 
the  optimality  condition  (4.21)  states  that  t  is  such  that  the 

if 

expected  marginal  loss  at  present  (t  )  of  buying  information  slightly 
*  * 

later  (t+1  rather  than  t  )  is  equal  to  its  expected  marginal 

* 

benefit  in  the  future  (from  t+1  on). 

If  information  is  always  perishing,  the  right-hand  side  of  (4.21) 
is  always  positive.  Therefore 

V'x(t*)  -  R(t*)  -  c  >  0 
or 

Vj(t*)  -  R(t*)  >  c  (4.22) 

* 

This  Inequality  states  that  the  net  benefit  at  t  of  buying  informa- 

*  * 
tlon  at  t  (assuming  that  the  decision  occurs  at  t  )  is  always  greater 

thar.  the  cost  of  information.  Intuitively,  if  the  purchase  of  information 
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at  time  t  will  not  provide  a  net  benefit  for  making  the  decision  at 
time  t  ,  there  is  no  value  in  buying  it  at  t  ,  since  we  can  always 
buy  it  at  t+1  without  being  worse  off  (assuming  perishing  information). 
This  argument  is  valid  if  the  decision  occurs  only  once.  When  a 
decision  is  repeated  in  time,  as  we  will  see  in  Chapter  5,  each  purchase 
of  information  may  be  used  for  more  than  one  decision  and  (4.22)  is  not 
true  anymore . 

4.4  A  Priori  Information  Recovery  in  the  Bidding  Example 

The  perishing  of  information  in  the  Bidding  Example  was  studied 
in  Chapter  2  (Sec.  2.8).  We  found  that  the  information  about  the  cost 
of  performing  the  bidding  contract  (p)  perishes  at  a  constant  rate 
Eq.  (2.49): 


V  (t)  -  —  •  X2t  .  v, (0)  X2t  (4.23) 

1  8  1 


X  is  a  constant  less  than  one.  Suppose  that  the  bidding  occurs  only 
once  and  the  probability  that  it  occurs  at  any  time  t  (given  that  it 
did  not  occur  before)  is  constant  (g)  .  Assuming  an  infinite  horizon 
we  can  use  the  optimality  condition  (4.12)  to  find  the  optimal  informa¬ 
tion  recovery  period.  We  have 

V(T*)  -  VX(T*) 

V(T)  ,  namely  the  net  expected  payoff  with  recovery  period  T  from  (4.6)  is 


19 


-  c 


£  8 (1-g)  •  V  (t) 

V(T)  -  — - — 

1  -  (l-g)T 

Substituting  for  V^(t)  from  (4.23)  we  have 

T-l 

g  V  (0)  £  U-g)^  C  -  c 

V(T)  =  - — - = - 

1  -  (1-g)1 

T 

gV1  (0)  [1  -  ((l-g)X2)  ]  -  c 

[1  -  (i-g)T][i-  d-g)x2] 

Substituting  V(T)  into  the  optimality  condition  (4.12)  and  simplifying, 
we  find 


,T*  T*  u  V  (0)  -  c 

XZT  (1  -(l-u)(l-g)T  ]  =  1 


vx(0) 


(4 


where  u  *  g/[l  -  (l-g)X^]  is  constant  (0  £  u  <  1)  .  Since  (1-u)  (1-g) 
is  often  small  compared  to  1,  we  have 


2T*  -  «  V0)  -  ' 


or 


vx(0) 


u  V  (0)  -  c 
log  - = - 

*  ~  .  vi(o>  J 


log  X' 


but  since  the  rate  of  information  perishing  (P)  is  1/X  ,  we  have 


-  8o  - 


It  is  easy  to  show  that  u  V (0)  -  c  is  the  net  expected  payoff  if  the  infor- 

* 

mation  is  bought  only  once.  Therefore  T  is  determined  from  (1)  the 

value  of  information  when  bought  only  once  (normalized  by  the  value  of  fresh 

information  V^(0))  ,  and  (2)  the  rate  of  information  perishing  p  .  We 

* 

buy  more  information  (T  smaller)  if  the  value  of  information  (when  bought 

only  once)  increases  or  if  the  rate  of  information  perishing  (p)  increases. 

An  increase  in  the  cost  of  information  (c)  is  reflected  directly  in  the 

value  of  information  (when  bought  only  once)  and  increases  T  .  The 
* 

changes  in  T  as  a  result  of  changes  in  g  and  X  (for  constant  c  ) 

can  be  seen  in  Fig.  4.7.  An  increase  in  g  would  always  result  in 

buying  more  inf ormation,  since  the  value  of  information  (when  bought  only 

* 

once)  increases.  Changes  in  the  value  of  X  will  influence  T  in  two 

opposite  ways.  Suppose  X  decreases,  this  results  in  an  increase  in  the 

* 

rate  of  information  perishing  which  tends  to  decrease  T  .  On  the  other 

hand,  the  decrease  in  X  would  lower  the  value  of  information  (when  bought 

* 

only  once)  which  tends  to  increase  T  .  For  large  values  of  g  the 
first  effect  is  dominant.  The  second  effect  becomes  dominant,  however, 
when  g  is  small  (Fig.  4.7). 


4.5 

The  optimality  condition  for  information  recovery  found  earlier 
in  this  chapter  for  an  expected  value  decision  maker  can  easily  be 


3tlmal  Information  Recovery  with  Risk  Aversion 


extended  to  the  case  of  a  risk-averse  decision  maker.  By  the  same 
argument  as  in  Section  4.2,  but  with  expected  utilities  substituting 


for  expected  values,  the  optimality  condition  (4.16)  is  written  as 


Ut(ti*)  -  Ut(t*+1)  (t<t*)  (4.26) 

where  Ufc(t^)  is  the  expected  utility  of  the  payoff  from  time  t 
on,  with  the  next  recovery  at  t^  .  The  optimality  condition 
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for  the  infinite-horizon  case  can  be  obtained  by  exactly  the  same 
method  as  in  the  case  of  an  expected-value  decision  maker.  We  find 

U(v (T*))  -  g  •  U(r(T*))  +  (1-g)  •  U(v (T*))  (4.27) 

where  r(T)  is  the  residual  lottery  at  T  and  v(T)  is  the  future 
net  payoff  lottery  with  recovery  period  T  .  Equation  (4.27)  reduces  to 

U(u(T*))  =  U(r(T*))  (4.28) 

Comparing  this  condition  with  that  of  an  expected-value  decision 
maker  (Eq.  (4.13)), 

V (T*)  =  R(T*) 

We  note  that  for  an  expected-value  decision  maker,  the  expected 

values  of  the  residual  lottery  and  the  future  payoff  lottery  are  equal 
* 

(at  T  )  ,  whereas  for  a  risk-averse  decision  maker  the  utilities  of 
the  two  lotteries  must  be  equal.  Note,  however,  that  the  two  decision 
makers  are  not  faced  with  the  same  lotteries.  In  other  words,  both 
r(T)  and  v(T)  are  different  for  the  two  decision  makers  (because 

A 

their  optimum  decisions  (d)  are  different).  This  makes  any  comparison 
of  the  optimal  information  recovery  policies  between  the  two  cases 
very  difficult.  In  particular,  there  is  no  easy  way  to  discover 
whether  (or  under  what  conditions)  a  risk-averse  decision  maker  will  buy 
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more  or  less  Information  than  an  expected-value  decision  maker.  For 
a  simple  Markovian  state  and  a  quadratic  payoff  function,  the  optimal 
information  recovery  periods  for  the  two  decision  makers  were  calculated 
and  compared  for  various  values  of  the  parameters  involved.  In  all  the 
cases,  where  the  calculation  was  done,  the  information  recovery  period 
for  the  risk-averse  decision  maker  was  found  to  be  larger  than  the  one 
for  the  expected-value  decision  maker.  In  other  words,  a  risk-averse 
decision  maker  buys  less  information  than  an  expected-value  decision 
maker,  when  there  is  uncertainty  regarding  the  time  of  the  decision. 
Whether  (or  under  what  conditions)  this  is  in  fact  true  remains  uncertain. 

A. 6  Summary 

In  this  chapter  we  have  investigated  the  a  priori  optimal  informa¬ 
tion  recovery  policies  for  a  one-time  contingent  decision.  Most  of  our 
results  concern  the  infinite-horizon  case.  For  this  case  it  was  shown 
that  an  optimal  information  recovery  period  obtained  from  local  optimality 
conditions  would  be  a  global  optimum,  if  information  is  always  perishing. 
This  condition  is  generally  required  to  make  a  local  optimum  a  global  one. 
A  necessary  condition  for  optimality  was  found  for  an  information  recovery 
policy  in  general.  This  condition  can  be  interpreted  as  follows:  At  an 
optimal  information  recovery  time  the  expected  marginal  loss  at  present 
(the  recovery  time)  of  buying  information  slightly  later  is  equal  to  its 
expected  marginal  gain  in  the  future.  For  the  infinite  horizon  case  the 
optimality  condition  reduces  to  a  simple  form  and  states  that  the  net 
expected  payoff  of  the  decision  is  equal  to  the  expected  payoff  of  the 
decision,  if  it  would  occur  when  the  information  is  at  its  lowest  point 


(immediately  before  each  recovery).  The  extra  payoff  of  the  decision, 
if  it  occurs  at  other  points  of  time,  will  be  just  enough  to  pay  for  the 
cost  of  the  information. 

The  optimal  information  recovery  period  for  the  Bidding  Example  of 
Chapter  2  was  found  to  be  determined  by  (1)  the  value  of  information, 
when  it  is  bought  only  once,  and  (2)  the  rate  of  information  perishing. 

The  optimal  recovery  period  decreases  (buying  more  information) ,  if 
either  of  these  factors  increases. 

The  effect  of  risk  aversion  on  the  optimal  information  recovery 
policies  was  studied  briefly.  The  optimality  condition  for  an  expected- 
value  decision  maker  was  extended  to  the  case  of  a  risk-averse  decision 
maker.  It  was  found  very  difficult  to  make  any  comparison  between  the 
two  cases,  however,  because  the  payoff  lotteries  are  different  for  the 
two  decision  makers.  For  a  simple  Markovian  state  and  a  quadratic 
payoff  function,  the  optimal  information  recovery  periods  for  an  expected- 
value  and  a  risk-averse  decision  maker  were  calculated  and  compared  for 
various  values  of  the  parameters  involved.  In  all  cases  the  Information 
recovery  period  for  a  risk-averse  decision  maker  was  found  larger  than 
the  one  for  the  expected-value  decision  maker.  This  suggests  that  a  risk- 
averse  decision  maker  buys  less  information  than  an  expected-value  decision 
maker,  when  there  is  uncertainty  regarding  the  time  of  the  decision. 

Whether  (or  under  what  conditions)  this  is  in  fact  true,  remains  uncertain. 


CHAPTER  5 

A  POSTERIORI  OPTIMAL  INFORMATION  RECOVERY 

In  this  chapter  the  a  posteriori  policies  for  recovery  of  infor¬ 
mation  are  investigated.  For  this  type  of  policy,  as  mentioned  in 
Chapter  3,  the  result  of  the  previous  observations  is  used  in  deciding 
the  time  of  the  next  observation(s) .  The  optimality  condition  is  found 
to  be  similar  to  that  of  the  a  priori  policies.  We  will  define  the 
"value  of  new  information"  which  is  found  to  be  more  appropriate  than 
"the  residual  value  of  past  information"  for  studying  the  a  posteriori 
policies.  Using  the  new  definition  we  will  find  conditions  under  which 
the  two  types  of  policies  coincide. 

5.1  Optimality  Conditions  for  A  Posteriori  Policies 

It  is  easily  seen  that  the  same  argument  which  was  used  in  Chapter  A 
(Sec.  A. 2)  to  obtain  the  optimality  condition  for  the  a  priori  case 
can  be  used  here,  except  that  all  the  a  priori  payoffs  must  be  replaced 
by  the  a  posteriori  payoffs  which  depend  on  the  result  of  the  previous 
observations.  For  a  Markovian  system  and  a  perfect  observation  the  only 
information  needed  at  each  point  of  time  is  the  result  of  the  last 
observation  of  the  system.  Consquently,  the  computations  are  greatly 
reduced  in  this  case.  Our  results  in  this  chapter  are  general,  however, 
and  do  not  require  the  Markovian  property.  If  Zq  is  the  result  of  all 
the  previous  observations  (up  to  the  last  observation  at  time  zero) ,  the 
optimality  condition  (A. 15),  when  written  for  the  a  posteriori  case  is, 

ve*-  v  •  vt(ti  +  i>  v  (5-i) 
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where 


VV  V  "  maxlmum  net  expected  payoff  from  time  t  on,  with 
the  next  observation  at  t^(t^  t  t)  ,  given 

Zo  *  (5.2) 

Note  that  Vt(t^)  defined  for  the  a  priori  case  is  the  expected  value 
^t^i*Zo^  over  values  of  Zq  .  From  (5.1)  the  necessary  condi¬ 
tion  for  optimum  recovery  time  (t*)  can  be  obtained  in  exactly  the  same 
way  as  in  the  a  priori  case.  We  get 

V(t*ZQ)  -  g^  •  R(t?Zo)  +  (l-gt*)  •  V(t*+1,ZQ)  (5.3) 

where  V(t,ZQ)  -  vt(t,Zo)  (defined  in  (5.2)),  and 

R(t,  7^)  *  residual  value  of  information  at  time  t  given 

that  the  result  of  the  previous  observations 
was  Z 

o 

Equation  (5.3)  has  exactly  the  same  form  as  the  optimality  condition  for 

the  a  priori  case  (Eq.  (4.17)),  but  with  a  posteriori  payoffs  substituted 

for  the  a  priori  ones.  Consequently  the  same  interpretation  as  in  the 

a  priori  case  holds  for  this  case,  namely  that  t  is  such  that  the 

marginal  expected  loss  at  present  (t  )  of  buying  information  slightly 
*  * 

later  (t  +1  rather  than  t  )  is  equal  to  its  marginal  benefit  in  the 
future  (from  t  +1  on)  .  The  benefits  and  losses,  however,  are 
a  posteriori  ones,  depending  on  the  result  of  the  previous  observations,  Z 


A 

5.2  Calculation  of  t  (Z^) 

A  * 

If  t  depends  on  Zq  ,  there  is  no  easy  way  to  find  t  (Zq)  in 

general.  The  optimality  condition  found  may  be  helpful  but  is  not  often 

* 

sufficient  for  calculating  t  (ZQ)  .  For  the  finite-horizon  case  we  may  use 

* 

dynamic  programming  to  find  t  (Z  )  .  For  the  infinite-horizon  case 

o 

the  method  of  policy  iteration  [  7  ]  may  be  used.  Both  cases  are  briefly 
explained  below. 

1.  Finite  Horizon:  To  use  the  backward  dynamic  programming 
method  we  have  to  change  our  definitions  slightly.  Let  us  define: 

V  (Z  )  ■  maximum  expected  payoff  from  time  t  on,  given 

Zt-r  * 

Rt(Zt_T>  ■  expected  residue  of  Information  at  time  t  , 
given  Zt_T  ; 


V^(Zt_x)  *  expected  payoff  at  time  t  with  an  observation  at  t  , 
given  Z^ 

t-t 

The  decision  in  each  period  is  whether  or  not  to  buy  new  information  at 
that  time, as  shown  in  Fig.  5.1. 


Figure  5.1.  Deslslon  at  time  t  in  dynamic  programming  formulation. 


W^(t)  and  Wj(t)  are  the  expected  payoffs  (at  t)  with  and  without  an 
observation  at  t  ,  respectively.  We  have 


(5.4) 


V  (Z  )  max{W  (t)  ,  W  (t)} 
t  t“T  A  * 

At  horizon  N  we  have 

W  ■  ““VVW  -  Vv*<Wc) 

Working  backward  we  can  calculate  vt^zt_T^  ^or  t  ”  N-l,  N-2,...,l,0  . 

The  burden  of  computation  increases  rapidly  as  N  or  the  number  of 

possible  values  of  Z  increases. 

2.  Infinite  Horizon:  For  the  inf ini te -horizon  case  the  method 

* 

of  policy  iteration  may  be  used  to  obtain  t  (Zq)  .  We  can  use  our 
optimality  condition  (Eq.  (5.3))  to  simplify  the  calculations.  The 
optimality  condition  is: 

V(t*,ZQ)  -  gtyR(t*,Zo)  +  (l-gt*)*V(t*+l,Zo)  (5.5) 

Also  we  have 

T*-l  * 

V(0,ZQ)  «£  Pt*R(t,Zo)  +  P>tyV(t  ,ZQ)  -  c  (5.6) 

t-o  — 

* 

This  condition  follows  from  the  optimality  of  t  and  splits  the  maximum 

expected  payoff  from  time  zero  on  (V(0?zo))  into  the  expected  payoff 

*  *  * 
from  0  to  t  -1  and  the  expected  payoff  from  t  on  (V(t  *ZQ)).  Pt 

is  the  probability  that  the  decision  will  occur  at  time  t  and 

*  * 

is  the  probability  that  it  will  occur  at  or  after  t  .  t  (Zq) 
can  be  found  by  the  following  procedure: 


(1) 


(2) 


(3) 


Select  (guess)  a  policy  t*  -  f1(Z  )  ,  calculate  V1(0,Z  ) 

o  o 

from  (5.6)  by  setting  V(t  ,ZQ)  at  an  arbitrary  value, 
and  Iterate  until  convergence. 

Calculate  V*(t*,Z  )  and  V*(t*+l,Z  )  from  V*(0,Z  ) 
o  o  o 

and  substitute  in  Eq.  (5.5)  to  find  the  improved  policy 

*  2 

t  -f2(zo)  . 

*  i 

Continue  (1)  and  (2)  until  t  **  f  (Zq)  converges  to  the 

ft  A 

optimal  policy  t  =  f(ZQ) 


This  method,  although  useful,  does  not  guarantee  the  convergence  to  the 

optimum  for  our  problem  and  often  convergency  problems  arise.  For- 

* 

tunately,  for  an  important  class  of  problems  t  is  not  a  function  of 

Z^  and  is  relatively  easy  to  calculate.  This  class  of  problems, 

however,  is  not  easily  distinguished  in  our  present  formulation,  in  which 

the  "residual  value  of  past  information"  is  the  main  variable.  In  the 

following  we  will  introduce  a  new  variable  which  may  be  thought  of  as  a 

dual  to  the  residual  value  of  past  information.  This  variable  is  found 

to  be  much  more  appropriate  for  the  study  of  the  a  posteriori  recovery  of 

information.  In  particular,  using  this  variable  makes  it  easier  to 

•k 

distinguish  the  class  of  problems  for  which  t  does  not  depend  on  Z^ 
5.3  Value  of  New  Information 

The  new  variable  is  the  "value  of  new  information"  at  each  point  of 
time  (given  our  state  of  information  at  that  time) ,  and  is  shown  in 
Fig.  5.2.  V^(t,ZQ)  -  R(t,ZQ)  is  the  expected  payoff  of  the  decision  at 
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Figure  5.2.  Value  of  new  information  W^(t,ZQ|T)  . 

time  t  ,  given  that  ZQ  was  observed  at  the  previous  observations 

of  the  state.  V|(t,Zo|T)  denotes  the  expected  payoff  of  the 

decision  at  time  t  with  new  information  at  T  ,  given  Z  .  The 

o 

difference 

W1(t,ZQ|T)  -  V^(t,Zo|T)  -  V1(t,ZQ)  (5.7) 

is  the  expected  value  of  the  new  Information  (at  T)  for  making  the 

decision  at  time  t  .  We  will  substitute  W, (t,Z  It)  for  V, (t,Z  )  in 

1  o  l  o 

our  formulations.  Therefore,  rather  than  seeing  how  valuable  our 
Information  is  from  the  past  (V  (t,Z  ))  ,  we  see  how  valuable  the  new 


information  is  at  each  point  of  time  (W. (t,z  It))  .  In  addition  to 

1  o 

simplifying  the  optimality  condition,  the  new  variable  has  the  advantage 

that  while  V^(t,ZQ)  and  V^(t,Zo|T)  always  depend  on  Zq  ,  the 

difference  W^(t,ZQ|T)  does  not  depend  on  Zq  for  an  important  class 

of  problems.  As  we  will  see  in  Section  5.5,  this  has  the  important 
* 

implication  that  t  is  independent  of  Zq  for  this  class  of  problems. 

Working  with  W^t.Z^T)  we  can  immediately  see  whether  or  not  t* 

depends  on  Z  .  This  is  not  the  case  if  we  work  with  V. (t,Z  )  and 
o  1  o 

we  may  get  involved  in  difficult  computations  even  if  t*  is  not  a 
function  of  Zq,  and  therefore  can  be  found  in  much  easier  ways. 

5 . 4  Optimality  Condition  in  Terms  of  "Value  of  New  Information" 

The  necessary  condition  for  optimality  (Eq.  (5.3))  can  be  written 

in  terms  of  the  value  of  new  information.  Let  us  define  W(t,Z  )  as 

o 


A! 

w(t,zo)  -  v(t,zo)  -  £  p  ,-v  (t’.Z  ) 

t'-t 


Recall  that  V(t , Zq)  was  the  maximum  expected  payoff  from  time  t  on 

(with  recovery  at  t  )  ,  given  .  The  summation  on  the  right-hand 

side  is  the  net  expected  payoff  from  t  on,  if  no  more  information  is 

bought  (p^ ,  is  the  probability  that  the  decision  will  occur  at  t 1  , 

and  N  is  the  horizon).  Therefore  W(t,Z  )  is  the  net  value  of  all 

o 

future  purchases  of  information  (including  the  first  one  at  time  t)  . 


Notice  that  W(t,Z  )  is  the  counterpart  of  V(t,Z  )  in  the  new 
o  o 

formulation.  Solving  Eq.  (5.8)  for  V(t,ZQ)  and  substituting  in  the 
optimality  condition  (5.3),  we  have 
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*  N 

w(t  ,zo)  +  Ipt,*i!(t,z0)  -  gt*  R(t*,zo)  +  (i-gt*)  • 


t'-t 


N 

W(t*+l,Z  )  +  £  P  |.R(t\Z) 
°  .  *  t  o 

t'»t  +1 


N 


By  subtracting  £  P_. *  R(t' ,Z  )  from  both  sides  we  find 


t  -  t 


W(t*,Zo)  =  (l-gt*)  •  W(t*+1,ZQ)  (5.9) 

This  equation  is  the  necessary  condition  for  optimality  written  in  terms 
of  the  value  of  new  information  and  has  a  much  simpler  form  than  Eq.  (5.3) 

5.5  Independence  of  the  Optimal  Revovery  Time  (t*)  from  the  Result 

of  the  Previous  Observations  (Z  ) 

- o 

* 

As  mentioned  earlier,  if  t  depends  on  the  realization  of  the  pre¬ 
vious  observations  (Zq) ,  it  is  often  very  difficult  to  calculate.  We  also 
mentioned  that,  fortunately,  for  an  important  class  of  problems  t*  does 
not  depend  on  ZQ  and  is  therefore  relatively  easy  to  calculate.  In 
this  section  we  will  find  conditions  under  which  t*  is  independent  of 
ZQ  •  We  will  then  identify  the  mentioned  class  of  problems. 

Theorem  5.1.  If  the  (total)  expected  value  of  new  information  when 
bought  only  once  is  always  independent  of  the  realization  of  the  previous 
observations  (Zq),  then  the  optimal  recovery  time  is  Independent  of  ZQ. 

Proof :  The  (total)  expected  value  of  new  information,  when  bought 
only  once,  is  the  net  expected  gain  from  the  time  of  the  information 
recovery  (T)  up  to  the  horizon  (N)  assuming  that  no  more  information 
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is  bought  in  this  interval  (Fig.  5.3).  Denoting  this  value  by  W1(T,Zo) 


W  (T,  Zq) 


V^t.ZjT) 


Vl(t’V 


Figure  5.3.  Value  of  new  information  when  bought  only  once. 


we  have 


"  E  Pt*  W1(t.Z0|T)  -  c 

where  pt  is  the  probability  of  the  occurrence  of  the  decision  at  time 
t  and  c  is  the  cost  of  the  information.  If  information  is  recovered 
more  than  once  (Fig.  5.4),  the  expected  benefit  (W(0))  is  the  sum  of  the 


•  oA  ■> 


W(0) 


Figure  5.4.  Value  of  future  purchases  of  information. 

expected  value  of  each  piece  of  information  when  bought  alone  over  all 
information  recoveries: 


w(0)  (5'10) 

where  Pt>t  is  the  probability  that  the  decision  occurs  after  the  it^ 

recovery .  The  information  available  at  time  t  is  Z  ,  which 

1  Ci-1 

is  shown,  for  simplicity  ,  by  Z±-1  .  Now  suppose  that 

/s  th 

t1(Z1  is  the  optimal  time  for  the  i  recovery  as  a  function 
of  the  realization  of  the  previous  observations.  The  maximum  expected 
value  (at  t*0)  of  all  future  information  recoveries  is 
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If  the  value  of  new  information  when  bought  only  once  (Wi(t,Z))  does 
not  depend  on  the  realization  of  the  previous  observations  (but  on  t  only) 


Now  suppose  that  the  information  recovery  schedule  (t 


defined  as 


and  since  (5.13)  is  true  regardless  of  values  of  Z 


follows  that 


but  W(t^  ,  t^  >...)  is  the  expected  benefit  without  using  the  result 
of  observations  ( )  in  deciding  t^  (or  namely  the  maximum 
expected  payoff  of  the  a  priori  policy).  It  cannot  be  greater  than  the 


expected  payoff  of  the  a  posteriori  policy,  namely  W(0,Zq)  .  There¬ 
fore,  from  (5.14)  we  must  have 


W(0,Zq)  =  W(t*  ,  t*  ,...)  . 


if  ic 

It  follows, therefore, that  the  schedule  (t^  ,  t^ 
for  the  a  posteriori  information  recovery.  Therefore, 
independent  of  Z^  ^ 


.}  is  also  optimum 


Vzi-i> 


is 


Theorem  5.2.  For  a  quadratic  payoff  function,  if  the  state  s(t) 
and  the  observation  js(t)  have  normal  distribution,  then  the  optimal 
information  recovery  period  is  independent  of  the  realization  of  the 
previous  observations. 


Proof :  We  show  that  fora  quadratic  payoff  function  and  a  normal 
state  (and  observation),  the  expected  value  of  new  information,  when  bought 
only  once, is  independent  of  the  realization  of  the  previous  observations 
(ZQ)  .  It  then  follows  from  Theorem  5.1  that  the  optimal  recovery  time 
is  independent  of  Zq  .  For  a  quadratic  payoff  function  the  expected 
payoff  was  found  in  Chapter  2.  We  showed  that  (Eq.  (2.10)) 


V2(i)  =  S '(y)  •  M  •  jK_y) 


(5.15) 


where  M  *  -1/2  GH  '  (G  and  H  are  coefficient  matrices  of  the  payoff 
function,  (2.5)), and  j>(y)  is  the  posterior  mean  of  the  state  vector 
s^,  given  information  y  : 

I(i)  *  (  (5.16)  - 
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In  Fig.  5.5  V  (t,Z  )  is  the  expected  payoff  at  time  t  given  Z 

-L  O  O 

V^t.zjT)  is  the  expected  payoff  at  time  t  (t  >  T)  with  new  information 


Figure  5.5  Value  of  new  information  when  bought  only  at  T  . 


at  T  ,  given  Z^  .  The  total  expected  gain  of  buying  new  information 
is: 


W(T,Zo)  =  £  Pt*  W1(t,2o|T)  -  c 
t  >  T 


(5.17) 


where 


W1(t,ZQ|T)  -  V1’(t,Zo|T)  -  V1(t,ZQ) 


(5.18) 
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Pt  is  the  probability  that  the  decision  will  occur  at  time  t  and  c 
is  the  cost  of  the  information.  From  Eq.  (5.18)  we  can  see  that  if 

Wi(t,ZolT)  is  always  (for  all  t  and  T  )  independent  of  zq  ,  then 
W(T,Zq)  will  be  independent  of  Z^  .  Therefore  it  is  sufficient  to 
show  that  W  (t,Z  It)  is  independent  of  Z  .  v.(t,Z  )  and  V*  (t,Z  |t) 
can  be  calculated  using  Eq.  (5.15).  We  have 

Vl(t,ZQ)  =  <s'(t)|  Zo,«f)  •  M  •  (s(t)|  ZQ,  S)  (5.19) 

and 

V^(t,Zo|T)  =  <5'(t)  •  M  •5(t)|zo,«g>>  (5.20) 

where 

s(t)  =  (  s(t)  UT,Z0,«?  ) 

Equation  (5.20)  can  be  written  as 

V.(t,Z  |T)  -  tr (M*E~)  +  (s'(t)|z  ,<£>•  M  •  (s(t)|z  .<?>  (5.21) 

1  o  s  —  o  —  o 

where  E-  is  the  matrix  of  the  covariances  of  s(t)  : 
s  — 

Z5  -  cov(Kt)|zo,<?) 
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Noting  that 


<I(t)|Zo,<?>  -  <(  s(t)|zT,Zo,<?)|zo,<f  ) 

-  (s(t)|Zo><g>) 


(5.22) 


and  in  view  of  (5.19),  (5.21)  can  be  written  as: 


Vl’(t,ZolT)  -  tr  (M*E~)  +  V1(t,ZQ)  . 


Therefore 

Wl(t’Zo,T)  "  vi<t»Z0lT)  "  Vl(t»Zo)  *  tr<M*E~)  (5.23) 

Now  we  show  that  if  s(t)  and  £(t)  have  normal  distribution,  Eg 
is  Independent  of  Zq  .  To  show  this, we  can  use  the  following  relation 
which  decomposes  the  prior  variance  into  the  variance  of  the  posterior 
mean  and  the  expected  value  of  the  posterior  variance  [ 16]  : 

v(x|«f)  -  v((x|i,<?)|<?>  +(v(x|y.«f  )\s )  . 

Letting  x  ■  s^(t)  ,  jr  ■  z^,  and  $  ■  ( Z^,*?  ),  we  have 

V(s(t)|  ZQ,<?  )  -  V((8(t)|zT,Zo,<JT)|Zo,<?)+  (V(  s(t)|zT,Zo,<?  )|Zotf) 
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but  the  first  term  on  the  right-hand  side  is  ;  therefore, 

Eg  -  v(s(t)|  zq,s)  -  <v<s(t)|2T,zo,<?)|zo,<?> 

If  s(t)  and  z(t)  have  normal  distribution,  then  both  V(  s(t)|z  ,  <?) 

o 

and  V(s(t)|zT,  Zq,<S")  are  independent  of  Z^  .  Therefore  is 

independent  of  Z^  and  the  proof  is  complete. 

5.6  A  Posteriori  Recovery  of  Information  for  the  Bidding  Example 

The  a  priori  recovery  of  information  for  the  Bidding  Example  of 
Chapter  2  was  studied  in  Chapter  4.  Here  we  will  find  the  a  posteriori 
policy  for  the  recovery  of  information  for  that  example.  The  uncertain 
variable  of  interest  in  the  Bidding  Example  was  the  cost  of  performing 
the  bidding  contract  (p)  .  We  assumed  that  p  has  a  constant  mean 

over  time,  and  that  its  variation  from  the  mean  (AP  -  s)  changes 
over  time  according  to  the  linear  Markovian  system 

s(t)  -  X  •  s(t-l)  +  e(t)  (5.24) 

We  have  also  assumed  that  our  observation  of  s  is  perfect.  Consequently 
the  only  information  needed  at  each  time  is  the  result  of  the  last  obser¬ 
vation  of  s  (sq)  .  To  find  the  optimum  a  posteriori  policy  for  updating 
our  information  about  s(t)  ,  let  us  first  discover  whether  such  a  policy 

would  depend  on  the  result  of  the  previous  observation  (s  )  .  By 

o 

Theorem  5.1  if  the  value  of  new  infomation  when  bought  only  once  is 
7*) 

See  updating  relations  for  normal  random  variables  in  [  2 ] ,  for  example 


independent  of  the  realization  of  the  previous  observation  (s  )  , 

o 

then  the  optimal  time  for  the  next  recovery  (t  )  is  independent  of 
8q  .  The  expected  gain  at  time  t  of  buying  new  information  at  T 
is 

W1(t,so|T)  -  V1,(t,so|T)  -  V1(t,sQ) 

W^(t,so|T)  is  shown  in  Fig.  5.6. 


Figure  5.6.  Value  of  new  information  bought  at  T  ,  given 


Substituting  for  V.(t-T,  s(T))  from  Eq.  (2.48),  we  find 


V’(t,so|T)  «  ^<(1  -  f  -  |  Xt_Ts(T))2|so,<fJ) 


f  (1  -  j)  -  (1  -  j)  X"  (s(T)|s.<?>  + 


|  X2(t_TV(T)|so,<f  ) 


From  (5.24)  it  is  easy  to  show  that 


(5.25) 


(s(T)|so,<£>  =  a  sq 


(5.26) 


(s2(T)  |  s  ,<?  )  -  X2Ts2  +0  (1  -  X2T) 
o  os 


(5.27) 


where  a  is  the  variance  of  s  .  Substituting  for  (s(T)ls  ) 
and  ( s2(T) | sQ,d? )  from  (5.26)  and  (5.27),  (5.25)  reduces  to: 


'(t.s  It)  =  xts  )2  +  -§■  (i  -  X2T)X2(t-T) 

J.  O  L  i  2  O  o 


(5.28) 


From  this  equation  and  Eq.  (2.48)  we  have 


W  (t,s  IT)  *  v'(t,s  T)  -  V  (t ,s  ) 

1  O  1  o  1  o 


Y  (1  -  X2T)X2(t_T) 


(5.29) 


Notice  that  both  V^(t,SQ)  and  Vj^t.s^T)  depend  on  sq  but  their 
difference  (value  of  new  information  at  t  )  does  not  depend  on  s 


It  follows  that  the  value  of  new  information  when  bought  only  once  is 

* 

independent  of  s  and,  therefore,  t  is  independent  of  s 
o  o 

Equivalently,  the  a  priori  and  the  a  posteriori  Information  recovery 
policies  are  the  same.  Therefore  the  periodic  information  recovery 
policy  found  for  the  a  priori  case  is  also  optimum  for  this  case.  The 
optimum  information-recovery  period  was  found  to  be  (Eq.  (4.25)): 


*  ~ 
T 


log 


a  /8 


lo8 


where  W  is  the  (total)  value  of  new  information  when  bought  only  once 
(given  that  the  past  Information  is  completely  perished)  and  p  is 
the  rate  of  information  perishing.  Note  that  the  value  of  new  informa¬ 
tion  when  bought  only  once  appears  as  an  Important  variable  in  the 
study  of  the  optimal  Information  recovery  policies. 

* 

5.7  Bounds  on  t  (Z  ) 

- - - »_o 

* 

It  was  mentioned  in  Section  5.2  that  if  t  depends  on  Zq  ,  it 

is  often  difficult  to  calculate.  In  this  section  we  find  upper  and 

lower  bounds  for  t  (Z  )  .  These  bounds  are  easily  obtained  from  the 

o 

notion  of  "value  of  new  information"  and  are  specially  useful  when  t* 
depends  on  1Q  . 

* 

(1)  Lower  bound  on  t  (Zq) :  Consider  one  purchase  of  Information. 


The  expected  benefit  from  this  Information  is  maximum,  if  no  more  information 


is  bought  in  the  future  (because  future  information  will  demolish  the 
residual  value  of  this  information).  This  maximum  benefit  must  be 
greater  than  the  cost  of  information  (c).  In  other  words,  the  net 
value  of  information, when  bought  only  once, must  be  positive: 

N 

wl(ttzo)  -  £  pt  *  Wl(t‘z0l  t*)  ‘c-°  (5.30) 

t-t* 

i  * 

Here  W^(t,ZQ|  t  )  is  the  gain  at  time  t  of  buying  new  information 
* 

at  t  (see  Fig.  5.6)  and  is  the  probability  that  the  decision 

will  occur  at  time  t  .  If  W^(t,ZQ)  is  increasing  with  t  (which  is 
often  the  case,  especially  if  we  are  not  too  close  to  the  horizon),  then 
(Zq)  ,  satisfying  the  equation 

W1(VZo)  =  o  (5.31) 

is  a  lower  bound  on  t*(Z  ) 

o 

(2)  Upper  bound  on  t  (Zq)  :  Considering  again  one  purchase  of 
information  (say  at  t  ),  the  expected  benefit  from  this  information 
is  minimum  if  it  is  intended  for  use  at  t  only  (this  is  the  case 
when,  for  instance,  the  next  purchase  will  be  at  t  +1  )  .  This  minimum 
value  cannot  be  greater  than  the  cost  of  information  (because  if  the 
expected  benefit  at  t  exceeds  the  cost  of  information,  we  cannot 
be  worse  off  by  buying  it  at  t  rather  than  at  a  later  time).  There¬ 
fore  we  have 
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p  * 

t 


W1(t?Zc|t*)  <  c 


(5.32) 


Assuming  that  •  W^(t,ZQ|t)  Is  Increasing  with  t  (which  Is  very 
often  the  case),  then  tj(Zo)  satisfying  the  equation 


(5.33) 


is  an  upper  bound  on  t  (z  )  . 

o 

Example  5 . 1  We  have  assumed  so  far  that  the  payoff  function  is 

quadratic.  We  have  shown  that  for  a  quadratic  payoff  function  and  a 
* 

normal  state  t  is  independent  of  Zq  (Theorem  5.2).  Here  we  give  an 

example  of  a  nonquadratic  payoff  function  for  which  t*  depends  on  Zq 

(for  a  linear  Markovian  system) .  Then  we  use  the  results  of  this 

4c 

section  to  find  lower  and  upper  bounds  on  t  (Zq)  .  Let  us  consider 
the  following  payoff  function: 


VjCs.d)  -  s2d  -  f  d2  (5.34) 

2  2 

Maximizing  v^(s,d)  is  equivalent  to  minimizing  (s  -  d)  .  Assuming 
that  s(t)  changes  according  to  the  linear  Markovian  system, 


s(t)  «  X  s(t-  1)  +  e(t) 
and  has  a  normal  distribution,  we  can  show  that 
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HIST* 


\2t  2  .  „  /!  -V  2t. 

As  +  a  (1  -A  ) 
os 


VJ(t,sojT) 


k 2t  2  ,  , 2t. 

A  s  +  o  (1  -  A  ) 
os 


+  Cs(1-A2T)  |2X2Tso2  +  Og(l- A2T)J.  A 

Therefore  the  expected  benefit  at  time  t  from  buying  new  information 
at  T  is: 


4(t-T) 


Wl(t’ScJT)  =  Vi<t‘s0lT>  *  Vi(t»sC)) 


=  a  (1-A2T)  2A2Ts  2  +  a  (1-X2T)  •  X4(t'T) 
s  os 


(5.35) 


Note  that  W^(t,so|T)  depends  on  sq  ,  and  therefore  we  expect  that 
* 

t  will  be  a  function  of  sq  too.  In  the  following  we  will  find  upper 


and  lower  bounds  for  t  (s  ) 

o 


(1)  Upper  bound  on  t  (sq)  :  From  Eq.  (5.33)  we  have  (assuming 
gt  ■  g  to  be  constant)  : 


g  •  (t2,so|t2)  -  c 


(5.36) 


Substituting  for  (t^.sj^)  from  (5.35)  we  have 
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2t„  r  2t. 


2t„ 


g  •  Oo(l  -X  2)  2X  2s2  +  a_(l-X  z) 


s 

2t„ 


O  8 


Solving  (5.37)  for  X  ,  we  find 


2t,  a  -  a 
X  2  -  -s - a  + 

a  -2s2 
s  o 


K  -  s.2f  g8  -  *'*<•, 

\\o  -2s2)  a  -2s2 

y  s  o  so 


(5.37) 


(5.38) 


where  t-  is  an  upper  bound  for  t  (s  ) 
^  o 


(2)  Lower  bound  on  t  (sq)  :  From  Eq.  (5.31)  We  have 


£  Pt  *  w1(t,80lti>  =  c 


t-t. 


Substituting  for  W^t.sjt^)  from  Eq.  (5.35)  we  find: 


*  2-  g(u-g)x4) 


4  f-h 


t-t. 


or 


g 


1  -  (l-g)X 


4  *  WvV  -  c 


(5.39) 


Comparing  this  equation  with  that  of  the  upper  bound  (5.36)  we  notice 
that  they  are  identical,  except  that  g  in  (5.36)  is  replaced  by 
u  ■  g/ (1  -  (l-g)X  )  in  (5.39).  Therefore  t^  can  be  obtained  from 
(5.38)  by  replacing  u  for  g  in  this  equation.  If  we  draw 
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W^t.sjt)  from  Eq.  (5.35),  then  from  (5.36)  and  (5.39),  t  and 
are  the  intersections  of  the  horizontal  lines  of  heights  c/g  and 
c/u  with  W1(t,SQlt)  ,  respectively.  This  is  shown  in  Fig.  5.7. 


Figure  5.7.  Upper  and  lower  bounds  for  t  ,  for  a  given  value  of  sq 

2 

For  s*  <  o  ,  W,  (t,s  It)  always  increases  with  t  .  Recall 
o  -  s  1  o' 

2 

that  this  was  necessary  for  the  bounds  to  be  valid.  For  s  >  a  , 

o  s 

W^(t,sQ|t)  has  an  overshoot  and  does  not  always  increase.  Nevertheless, 
we  can  find  the  correct  bounds  by  taking  those  intersections  of  the  hori¬ 
zontal  lines  c/g  and  c/u  with  W^(t,SQ|t)  which  are  in  the  increas¬ 
ing  segment  of  W^(t,sQ|t)  .  In  Fig.  5.8  the  upper  and  lower  bounds 
as  functions  of  sq  are  depicted  for  numerical  values  of  g  -  .2,  X  *  .9  , 

o  ■  2.5  ,  and  c  *  1 
s 
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Figure  5.12.  Upper  and  lower  bounds  as  functions  of  s 


5.8  Summa 


In  this  chapter  the  a  posteriori  optimal  information  recovery 
policies  for  a  one-time  contingent  decision  were  Investigated.  The 
optimality  condition  for  this  case  was  shown  to  be  the  same  as  the  opti¬ 
mality  condition  for  the  a  priori  case  except  that  the  a  posteriori  pay¬ 
offs  were  substituted  for  the  a  priori  ones.  The  optimal  information 
recovery  time  depends,  in  general,  on  the  result  of  the  previous  observa^ 
tlons  of  the  state. 


In  Chapter  4,  where  we  studied  the  a  priori  information  recovery 
policies,  the  main  variable  in  the  formulation  of  the  ontlmality  condition 


was  the  "residual  value  of  the  past  information"  at  each  point  of  time. 

In  this  chapter  we  have  defined  a  new  variable,  namely  the  "value  of 
new  information"  at  each  point  of  time.  The  two  variables  are  intimately 
related  and  may  be  considered  dual  to  each  other.  In  addition  to  simpli¬ 
fying  the  optimality  condition,  the  new  variable  allowed  us  to  make  an 
important  observation  about  the  optimal  information  recovery  policies, 
concerning  the  following  question:  Under  what  conditions  is  the  optimal 
information  recovery  time  independent  of  the  result  of  the  previous 
information  recoveries?  Finding  such  conditions  is  clearly  very  important, 
because  the  a  posteriori  optimal  policy  would  be  the  same  as  the  a  priori 
one  under  these  conditions  and,  therefore, the  computation  of  the  optimal 
policy  is  greatly  simplified.  Using  the  new  variable  we  have  shown  that 
if  the  (total)  value  of  the  new  information,  when  it  is  bought  only  once, 
is  always  independent  of  the  result  of  the  previous  observations,  then 
the  optimal  information  recovery  time  is  independent  of  the  result  of  the 
previous  observations.  This  result  is  general  and  assumes  no  conditions 
on  the  state,  the  decision,  or  the  information  structure.  It  is  also 
relatively  easy  to  check.  An  important  example  of  the  above  property  is 
when  the  payoff  function  is  quadratic  and  the  state  and  the  observation 
have  normal  distributions. 

Finally,  when  the  optimal  Information  recovery  time  depends  on  the 
result  of  the  previous  observations,  we  have  found  both  upper  and  lower 
bounds  for  it.  These  bounds  denend  on  the  result  of  the  previous  observa¬ 
tions. 


Ill 


CHAPTER  6 


OPTIMAL  INFORMATION  RECOVERY  FOR  REPETITIVE 
CONTINGENT  DECISIONS 

In  Chapters  4  and  5  we  investigated  the  optimal  information 
recovery  policies  for  decisions  which  occur  only  once.  In  this  chapter  we 
will  extend  our  study  to  the  case  where  the  decision  may  be  repeated  in 
time.  For  simplicity  we  will  restrict  ourselves  to  the  a  priori  policies. 
We  know  from  Chapter  5  that  for  an  important  class  of  problems  the 
a  priori  and  the  a  posteriori  policies  are  the  same. 

6.1  Differences  with  the  One-Time  Decision  Case 

The  most  important  difference  with  the  one-time  decision  case  is, 
of  course,  in  the  occurrence  model.  We  assume  that  each  occurrence  of 
the  decision  is  independent  of  the  previous  occurrences, 

{Decision  occur  at  t|<?}  *  gt  (6.1) 

For  simplicity  we  assume  throughout  this  chapter  that  the  horizon  is 
infinite  and  gt  is  constant  in  time.  There  are  two  important  differences 
between  the  two  cases,  concerning  the  recovery  of  information.  In  the 
repetitive  case,  (1)  each  piece  of  information  may  be  used  for  more  than 
one  decision,  and  (2)  there  may  be  an  opportunity  to  learn  about  the 
state  of  the  system  at  the  time  of  a  decision  from  the  outcome  of  that 
decision  and  use  this  information  for  later  decisions.  The  first  property 
results  in  buying  more  information  compared  to  the  one-time  decision 
case  because  information  is  potentially  more  valuable  in  this  case. 


1 


The  second  property,  however,  results  in  buying  less  information,  since 

we  obtain  some  free  information  from  each  decision.  For  the  repetitive 

case  we  have  to  answer  the  following  questions: 

* 

(1)  What  is  T  ,  namely  the  optimum  recovery  period  assuming 
there  is  no  interruption  by  decisions? 

k 

(2)  How  should  T  be  revised  when  a  decision  occurs  (and  we 
receive  some  information  from  it)? 

Clearly,  the  answers  to  these  questions  depend  on  the  type  of 
information  learned  from  each  decision.  In  the  following  we  will  answer 
the  questions  for  four  types  of  information  learned  from  decisions: 

No  information,  perfect  information,  perfect  but  delayed  information, 
and  prompt  but  imperfect  information. 

Finally,  since  we  assume  infinite  horizon  we  have  to  either  discount 
the  future  payoffs  (using  the  present-value  criterion)  or  use  other 
criteria  (the  rate  of  payoff,  for  example)  to  avoid  the  unboundedness  of 
the  total  payoff. 

6.2  Optimal  Recovery  of  Information 

I.  No  information  from  decisions. 

This  is  an  extreme  case, where  we  learn  nothing  about  the  state 

from  a  decision.  This  might  be  the  case  when,  for  example,  the  outcome 

of  the  decision  will  not  be  known  until  after  a  very  long  delay.  The 

k 

necessary  condition  for  the  optimum  recovery  period  T  (the  recovery  period 

with  no  interruption  by  decisions)  can  be  obtained  by  the  same  argument 

as  in  the  one-time  decision  case  (Section  4.2).  From  the  result 
* 

of  that  argument*  T  can  be  obtained  by  equating  the  expected  payoff 
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both  calculated  at  ti=a  T 


(4.15)).  These  expected  payoffs  are  shown  In  Fig.  6.1.  If  Information  is 


Figure  6.1.  Payoffs  at  T 


with  no  information  from  decisions 


the  maximum  present  value  ( at  T  )  of  all  future 


payoffs  is  denoted  by  V(T  ) 


If  information  is  recovered  at  T  +  1 


then  the  maximum  present  value  (at  T  )  of  all  future  payoffs  is 


R(T  )  +  3  •  V (T  )  if  the  decision  happens  at  T  (R(T  )  is  the  residual 


value  of  past  information  at  T 


and  3  is  the  discount  rate) ,  and 


3*V(T  )  ,  if  the  decision  does  not  happen  at  T 


Equating  the  expected 


payoffs  of  recovery  at  T  and  T  +  1  ,  we  have 


After  simplifying  we  find 


To  find  the  interpretation  of  this  condition,  note  that  the  present 
value  of  a  sequence  of  uniform  payments  (paid  each  period)  of  amount 
(1  -  3)V(T*)  is 

(1  -  3) V(T*>  [1  +  3  +  32  +  ...1  *  (1  -  3)V (T*)  -  -  V (T*) 

1-3 

Therefore,  (1  -  3)V(T  )  is  the  uniform  payment  equivalent  of  V(T  )  , 

—  A 

and  we  denote  it  by  VU(T  )  •  Therefore  (6.2)  can  be  written  as: 

VU(T*)  =  g  •  R(T*)  (6.3) 

This  condition  states  that  the  uniform  payment  equivalent  of  the  optimal 
policy  is  equal  to  the  expected  payoff  immediately  before  recovery. 

Comparing  (6.3)  with  the  optimality  condition  for  the  one-time  decision 
case, 

V(T*)  -  R(T*) 

we  can  see  a  strong  similarity  between  the  two  conditions.  In  the  one¬ 
time  decision  case  we  can  think  that  the  decision  will  ultimately  happen, 

-  A 

and  the  expected  payoff  is  equal  to  R(T  )  .  In  the  repetitive  case,  the 
uniform  payment  equivalent  at  each  period  is  R(T)  multiplied  by  the  prob¬ 
ability  that  the  decision  will  occur  at  each  period  (g) . 

Since  no  information  is  learned  from  the  decisions,  the  occurrence  of 
a  decision  has  no  effect  on  the  next  recovery  times  and  we  will  have  a  peri¬ 
odic  recovery.  Since  in  the  repetitive  case  the  result  of  each  observation 


may  be  used  more  than  once  to  make  the  decision,  the  information 

is  more  valuable  than  in  the  one-time  decision  case.  Consequently, we 

* 

buy  more  information  in  the  repetitive  case  and  therefore,  T  for  the 

repetitive  case  (with  no  information  from  decisions)  is  smaller  than 
* 

T  for  the  one-time  decision  case. 

II.  Perfect  information  from  decisions. 

This  case  is  the  other  extreme,  namely  when  we  learn  perfect 

information  about  s(t)  from  each  decision.  Since  in  this  case  the 

occurrence  of  a  decision  provides  us  with  perfect  information  about  the 

state,  the  past  information  will  have  no  value  after  a  decision  occurs. 

As  a  result,  each  recovery  of  information  has  exactly  the  same  value  as 

in  the  case  of  the  one-time  decision  (because  in  the  one-time  decision  case 

the  information  is  also  used  for  making  the  decision  once) .  It  follows 
*  * 

that  T  is  the  same  as  T  in  the  one-time  decision  case.  After  each 

occurrence  of  the  decision,  however,  we  must  revise  the  planned  time  for 

the  next  recovery.  Since  the  occurrence  of  each  decision  is  tantamount 

* 

to  a  new  observation,  the  next  recovery  time  must  be  revised  to  T 

units  after  the  time  of  the  decision.  Therefore  the  next  recovery  time 
* 

is  always  T  units  from  either  the  last  recovery  time  or  the  last 
decision,  whichever  occurs  later  (Fig.  6.2). 


Figure  6.2.  Optimal  information  recovery  with  perfect 
information  from  decision  • 


Information  recovery 


Occurrence  of  decision 


Notice,  however,  that  this  policy  maximizes  the  rate  of  expected 
payoff  over  time,  but  not  necessarily  the  present  value  of  the  expected 
payoff.  Let  us  find  the  optimality  condition  when  the  present-value  cri¬ 
terion  is  used.  By  the  same  argument  as  before,  we  can  find  T  by  equating 

^ 

the  expected  payoffs  (at  T  )  of  recovery  at  T  and  recovery  at  T  +  1  . 

* 

These  payoffs  are  shown  in  Fig.  6.3.  The  present  value  (at  T  )  of 


* 

1. 


« 


ft  _  •ff 

the  expected  payoff  with  recovery  at  T  is  denoted  by  V(T  )  .  Now 

consider  the  expected  payoffs  when  we  plan  to  buy  information  at  T  +  1  . 

If  the  decision  occurs  at  T*  ,  the  expected  payoff  at  T*  is  R(T*)  , 
and  since  we  learn  perfect  information  from  this  decision  we  will  not  buy 

*  _  h 

information  at  T  +1*  and  the  expected  future  payoff  is  V(T  )  +  c 

(c  is  the  cost  of  information)  because  we  obtained  free  information  from  the 
*  t 

decision  at  T  .  Therefore  the  total  present  value  of  the  expected 

t  * 

This  is  an  approximation  because  V(T  )  +  c  is  the  expected  future 

payoff  if  Information  was  learned  at  T  +  1  ,  while  information  is 

in  fact  learned  at  T*  . 
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f(  —  "He  —  ^ 

payoff  ,  if  decision  occurs  at  T  ,  is  R(T  )  +  8(V(T  )  +  c)  .  If  the 

i k  jc 

decision  will  not  occur  at  T  ,  then  we  buy  information  at  T  +  1  and 

*  _  * 

the  present  value  (at  T  )  of  the  expected  payoff  is  8V(T  )  •  Setting 

*  * 

the  present  expected  value  of  recovery  at  T  and  T  +  1  equal  and 
simplifying,  we  find 


(1  -  8)  •  V (T*)  *  g  •  R(T*)  +  g  •  8  *  c 


it  _  A 

and  since  (1-8)  *  V(T  )  is  the  uniform  payment  equivalent  of  V(T  )  , 

—  * 

namely  (T  ),  we  have 


Vu(T*)  =  g  .  R(T*)  +  6  •  g  •  c 


(6. A) 


Comparing  this  equation  with  the  optimality  condition  for  the  case  of 
no  information  from  the  decision  (Eq.  (6.3)),  we  notice  that  the  benefits 
of  learning  s(t)  from  each  decision  are  represented  by  the  term  8  *  g  *  c  . 
This  term  is  the  uniform  payment  equivalent  of  the  benefits  of  learning 
from  decisions.  Intuitively,  these  benefits  must  be  proportional  to  the 
cost  of  information  (c) ,  and  the  probability  of  the  occurrence  of  the 
decision  (g)  .  The  reason  g  .  c  is  discounted  by  8  Is  that  information 
from  a  decision  (say  at  t  )  can  be  used  for  decisions  from  t+1  on 
(but  not  at  t  ) ,  while  if  the  Information  is  bought  at  t  ,  it  can  be  used 
from  t  on.  Therefore  information  from  the  decision  has  an  inherent 
one-unit  delay  and  that  is  why  g  •  c  is  discounted  by  8  . 
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III.  Perfect  but  delayed  Information  from  decisions 


« 


i 


In  this  case  we  assume  that  perfect  information  can  be  learned  from 

each  decision,  but  after  a  delay  t  .  Therefore,  if  the  decision  happens 

at  time  t  ,  we  will  learn  s(t)  at  time  t  +T  .  This  is  the  case 

when,  for  instance,  the  outcome  of  the  decision  will  be  revealed  after 

,  * 

a  delay.  Again  we  have  to  find  T  (the  optimal  recovery  period  with 
no  interruptions  by  decisions) ,  and  the  manner  in  which  the  planned 
recovery  time  must  be  revised  after  the  occurrence  of  a  decision.  Let 
us  define 

t^  ■  time  of  the  iC^  occurrence  of  the  decision 
after  the  last  recovery, 

T^  ■  optimal  time  for  the  next  recovery  after  the  ith 
occurrence  of  the  decision. 

Proposition  6.1.  If  the  time  of  receiving  information  from 

decision  (t^  +  t)  is  before  the  planned  time  of  the  next  recovery 

* 

,  then  the  next  recovery  time  must  be  revised  to  t,  +  T 

If  the  information  from  the  decision  is  revealed  after  the  planned  time  for 

the  next  recovery,  then  the  next  recovery  time  either  remains  unchanged  or 
* 

changes  to  t^  +  T  : 


(1)  t±  +  T  <  T1_1 

(2)  t±  +  T  >  T1_1 


fci  + 


r-  r 

(t1  +  T 


-  iao  - 

mm _ , _ 


Figure  6.4  illustrates  the  above  statements  . 


Proof :  When  t^  +  x  <_  ^  ,  the  information  from  the  decision  is 

revealed  to  us  before  the  planned  time  for  the  next  recovery  and  we  can 
benefit  from  this  information.  After  the  information  is  revealed, it  is 

as  if  a  recovery  was  made  at  t^  .  Therefore  the  next  recovery  time  must 

* 

be  revised  to  t^  +  T 

(2)  If  tt  +  T  >  T1_1  ,  the  information  from  the  decision  will  be 
revealed  to  us  after  the  planned  time  for  the  next  recovery.  Therefore  • 
if  we  want  to  obtain  the  benefits  of  this  free  Information,  we  should  not 
buy  information  at  the  planned  time  j  .  If  we  decide  to  do  so,  then 
tj  should  be  regarded  as  the  last  recovery  time  and  therefore 
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+  T  .  There  is  an  expected  loss  in  so  doing,  however,  because 
we  will  be  low  on  information  from  to  t^  +  T  (*ig«  6.5).  If 


Figure  6.5.  Loss  due  to  not  buying  information  at  ^ 


this  loss  exceeds  the  benefits  of  the  free  information  from  the  decision, 

we  will  be  better  off  by  ignoring  the  information  from  the  decision 

and  buying  information  at  the  planned  time  T^_^  .  Therefore  in  this 

* 

case  T^  is  either  t^  +  T  or  T^  ^  . 

* 

The  optimality  condition  for  T  can  be  found,  as  before,  by 

* 

equating  the  present  value  (at  T  )  of  the  expected  payoffs  with  re- 

*  *  * 
covery  at  T  and  T  +  1  .  However,  since  the  recovery  at  T  +  1 

* 

depends  on  whether  or  not  the  decision  will  occur  at  T  ,  we  must  in 
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fact  equate  the  present  value  of  the  expected  payoffs  (at  T  )  of  buying 
information  and  waiting  as  shown  in  Fig.  6.6.  This  figure  is  the  same 
as  Fig.  6.3  (perfect  and  prompt  information  from  decision),  except  that 


R(T*)  +  $V (T*) 

R(T*)  +  3  (V (T*)  +c  -L) 

3  •  V (T*) 

I 

Figure  6.6.  Expected  payoffs  with  perfect  but  delayed  information 
from  the  decisions. 

•k  k 

if  we  do  not  buy  information  at  T  ,  and  a  decision  occurs  at  T  ,  then 

k 

we  have  to  decide  whether  to  buy  information  as  planned  (at  T  +  1  )  or 

* 

to  wait  and  benefit  from  the  information  from  the  decision  at  T 

If  we  buy  information  at  T  +  1  as  planned,  the  present  value  of  the 

.  A  * 

expected  payoff  is  R(T  )  +  8V(T  )  .  If  we  wait,  we  must  buy  information 

*  *  * 

at  T  +  T  *  2T  .  The  present  value  of  the  expected  payoff  is 


•j*  —  jf  —  jf 

approximately  R(T  )  +  B(V(T  )  +  c  -  L)  where  L  is  the  present 
value  of  the  loss  (£)  by  not  buying  information  at  T*  +  1  ,  as  shown 
in  Fig.  6.7.  Equating  the  present  value  of  the  expected  payoff  of  buying 


Figure  6.7.  Loss  of  not  buying  information  at  T 


information  at  T  with  that  of  waiting,  we  have 

V(T*)  =  g  •  Max{g(T*)  +  eV(T*)  ,  R(T*)  +  B(V(T*)  +  c  -  L)} 

+  (l-g)BV(T*) 

After  simplifying  we  find 

Vu(T*)  -  g  •  R(T*)  +  B  *  g  Max(0,  c  -  L)  (6.5] 


Ignoring  the  one-unit  delay  of  Information  from  the  decision. 


-  K  .  K 

where  V^(T  )  =  (1  -  $)V(T  )  is,  as  before,  the  uniform  payment  equi- 
valent  of  V(T  )  .  Comparing  this  equation  with  the  optimality  condi¬ 

tions  of  the  previous  cases  (Eqs.  (6.3)  and  (6.4))  we  notice  that  the 
difference  is  in  the  second  term  on  the  right-hand  side,  which  is  the  uni¬ 
form  payment  equivalent  of  benefits  of  learning  from  the  decisions.  More¬ 
over,  (6.3)  and  (6.4)  can  be  easily  obtained  from  (6.5).  If  T  =  0 

(information  prompt)  then  L  =  0  and  (6.5)  is  reduced  to  (6.4).  On  the 

other  hand, if  T  is  very  large  (which  is  equivalent  to  not  learning 
from  decisions),  then  c  -  L  <  0  and  (6.5)  reduces  to  (6.3).  For  t 

greater  than  some  T  »  c  -  L  remains  negative  and  T  is  the  same  as 

nicix 

*  * 

T  in  the  case  where  we  learn  no  information  from  the  decisions  (T  min) •  It 

jjf 

is  clear  that  T  must  be  smaller  than  T  .  because  otherwise  we  could 
ma  x  ml n 

not  start  using  the  information  from  the  decisions.  As T  decreases  below  Tmax 

^  ^ 

T  increases.  For  t  =  0  ,  T  reaches  its  maximum  (T  )  .which  is  the 

max 

*  * 
same  as  T  in  the  case  of  perfect  and  prompt  information.  T  as  a  func¬ 

tion  of  t  is  shown  in  Fig.  6.8.  Intuitively  as  T  decreases  below 

T  we  learn  more  from  the  decisions,  and  consequently  we  buy  less 
max 

* 

T 

* 

T 

max 

* 

Tmin 


T 

max 

* 

Figure  6.8.  T  as  a  function  of  T 
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information  anticipating  more  free  information  from  a  coming  decision. 


IV.  Prompt  but  imperfect  information  from  decisions. 

Finally  we  study  the  case  where  prompt  but  imperfect  information  is 

learned  from  decisions.  This  is  the  case  where  the  outcome  of  a  decisioft 

is  revealed  promptly  but  the  state  cannot  be  perfectly  observed  from  the 

outcome.  Let  us  assume  that  each  time  a  decision  is  made,  the  level  of 

our  information  jumps  to  0  *v  ,  where  v  is  the  level  of  information 

o  o 

immediately  after  a  perfect  observation  and  0  is  a  constant  less  than  1 
(Fig.  6.9).  We  assume  that  the  information  from  the  decision  perishes 


Figure  6.9.  Prompt  but  imperfect  information  from  decisions. 


(at  T  )  equal.  The  decision  tree  is  the  same  as  in  the  previous  cases 

* 

except  for  the  payoff  when  we  wait  at  T  and  the  decision  happens  at 


T  (Fig.  6.11).  In  this  case  we  will  not  buy  information  at  T  +1,  but 


will  rather  wait  until  T  -A+T 


and  the  expected  payoff  is  V  '(T  ) 


Figure  6.11.  Expected  payoffs  at  T  with  prompt  but  imperfect 
information  from  decisions. 


V  *(T  )  can  be  found  from  Fig.  6.12.  Notice  that  the  payoff  from  time 


T  +  1  on  (with  the  information  from  the  decision)  is  the  same  (but  starting 


Figure  6.12.  Loss  of  not  buying  information  at  T  +  1 


A  units  earlier)  as  the  expected  payoff  from  time  T  +1  +  A  on  with  an 

•k 

observation  at  T  + 1  .  Therefore  we  can  write 


V  '(T*)  =  R(T*)  +  B“A~1[V(T*)  +  c  -  L] 


(6 


where  L  is  the  present  value  (at  T  +  1)  of  the  equivalent  loss  (Jt)  , 
as  shown  in  Fig.  6.12.  The  optimality  condition  is: 


V(T*)  = 


g  •  V  '(T*)  +(l-g)6V(T*) 


(6 


_  ,  * 

Substituting  for  V  (T  )  in  (6.7)  from  (6.6)  and  after  simplifying 
we  find 


V  (T*) 
u 


l-g(l+  8'1+  8'2  +  . . .  +  B"*"1) 


g  R(T*)  +  gB_A"1(c  -  L) 


(6.8) 


This  condition  is  similar  to  the  previous  optimality  conditions,  but  it 
is  somewhat  complicated.  The  benefits  of  information  from  the  decisions 
are  reflected  on  both  sides  of  the  equation,  and  therefore  cannot  be 

interpreted  as  easily  as  before. 

* 

T  as  a  function  of  0  has  the  shape  of  Fig.  6.13.  For  0  less  than 

* 

some  0q  ,  the  recovery  period  is  constant  (T  min)  .  This  corresponds 


* 

Figure  6.13.  T  as  a  function  of  0 
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to  the  case  where  the  information  from  the  decision  is  less  than  the. 

residual  value  of  information  at  the  recovery  time  (with  no  information 

from  decisions).  Therefore,  there  is  no  value  to  this  information  and 
*  * 

Tmin  is  t*ie  same  as  T  wit*1  n0  information  (case  I) .  As  0  Increases 

.  * 
above  0  ,  the  information  from  the  decisions  is  useful  and  T  in- 
o 

creases  (we  buy  less  information  anticipating  free  information  from  a 

*  * 

coming  decision).  For  0*1  ,  T  reaches  its  maximum  (T  )  which  is 

max 

* 

the  same  as  T  in  case  II.  Recall  that  the  same  results  were  obtained 

•k 

in  case  III,  where  we  found  T  as  a  function  of  the  delay  T  .  On  the 

*  * 

other  hand,  we  argued  in  case  II  that  T  for  that  case  (■  T  )  is  the 

max 

* 

same  as  T  for  the  one-time  decision  case.  From  these  observations  we 
can  state  that: 

Proposition  6.2.  If  the  decision  is  repeated  in  time  we  always 
buy  more  information  than  if  it  happens  only  once. 

Comparing  the  results  of  this  chapter  with  those  of  Chapter  4  (one¬ 
time  decision  case)  we  can  see  that  in  the  one-time  decision  case  the 
residual  value  of  past  Information  at  recovery  times  R(T*)  is  the 
measure  of  the  net  expected  payoff  of  the  decision.  In  the  repetitive 
case,  however,  the  expected  payoff  has  two  components.  The  first  com¬ 
ponent  is  (like  the  one-time  decision  case)  the  residual  value  of  past 
information  at  recovery  times  which  represents  the  expected  payoff  with¬ 
out  learning  from  decisions.  The  second  component  represents  the 
benefits  of  learning  from  decisions  and  depends  on  the  cost  of  the 
information  and  the  probability  of  the  occurrence  of  the  decision.  In 
the  one-time  decision  case  an  Increase  in  T*  implies  less  payoff 
(because  R(T  )  decreases) .  This  may  not  be  true  for  the  repetitive 
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case  because  an  increase  in  T  may  have  resulted  from  learning  more  infor¬ 
mation  from  decisions  and,  therefore,  may  imply  a  greater  payoff. 

6.3  Summary 

In  this  chapter  we  have  extended  the  results  obtained  for  a  one-time 
contingent  decision  (Chapter  4)  to  the  case,  where  the  contingent  decision 
may  be  repeated  in  the  future.  We  have  noted  two  important  differences 
from  the  one-time  decision  case:  (1)  In  the  repetitive-decision  case 
each  piece  of  information  may  be  used  for  more  than  one  decision  and, 
therefore,  information  is  potentially  more  valuable;  and  (2)  there  may 
be  an  opportunity  to  learn  from  each  decision  about  the  state  at  the 
time  of  the  decision,  and  use  this  free  information  for  future  decisions. 
Four  cases  have  been  studied  corresponding  to  four  types  of  information 
learned  from  decisions:  no  information  from  decisions,  perfect  informa¬ 
tion  from  decisions,  perfect  but  delayed  information  from  decisions,  and 
prompt  but  imperfect  information  from  decisions. 

The  optimality  condition  is  found  for  each  of  the  above  cases.  It 
was  noted  that  if  this  condition  is  written  in  terms  of  the  uniform- 
payment  equivalent  of  the  future  payoffs,  the  ODtlmality  condition  will 
be  very  similar  to  the  condition  for  the  one-time  decision  case.  In 
all  cases  we  found  that  for  the  optimal  recovery  policy  the  residual 
value  of  the  past  information  immediately  before  a  recovery  is  a  measure 
of  the  expected  payoff  of  each  decision  (as  it  was  in  the  case  of  a 
one-time  decision).  However,  there  is  an  additional  component  to  the 
payoff,  which  is  due  to  the  free  information  learned  from  decisions. 

This  component  appears  in  the  optimality  condition  as  a  separate  term. 
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and  gives  us  the  exact  benefits  of  learning  from  decisions  (provided 
that  information  is  recovered  optimally). 


It  was  shown  that  in  the  repetitive-decision  case  we  always  buy 
more  information  than  in  the  one-time  decision  case,  because  information 
may  be  used  for  more  than  one  decision  and  is,  therefore,  more  valuable. 
However,  the  more  information  we  learn  form  the  decisions,  the  less 
information  we  will  buy  (anticipating  free  information  from  a  forth¬ 
coming  decision).  When  perfect  information  is  learned  from  each  decision, 
we  will  buy  Information  at  the  same  rate  as  in  the  one-time  decision 
case . 

An  Interesting  question  regards  the  manner  in  which  the  time  of  the 
next  recovery  must  be  revised  after  a  decision  occurs  and  free  informa¬ 
tion  is  learned  (or  will  be  learned)  from  the  outcome  of  the  decision. 

It  was  noted  that  the  next  recovery  time  must,  generally,  be  changed  to 
a  time,  when  the  information  from  the  decision  has  reduced  in  value  to 
the  level  just  before  a  recovery.  There  are  some  exceptions  to  this 
rule,  however.  For  example,  when  we  learn  perfect  but  delayed  information 
from  decisions,  we  have  to  wait  before  the  information  from  the  decision 
is  revealed.  During  this  waiting  period  we  may  be  low  on  information  and 
have  to  bear  a  loss,  if  the  decision  occurs.  If  this  loss  is  larger 
than  the  expected  benefit  of  the  Information  from  the  decision,  we  would 
be  better  off  Ignoring  the  Information  from  the  decision  and  buy  informa¬ 
tion  at  the  previously  planned  time. 
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CHAPTER  7 


SUMMARY 

7.1  Conclusions 

In  the  first  part  of  this  research  the  process  of  Information 
outdating  was  analyzed  in  time.  He  determined  how  the  process  depends  on 
the  characteristics  of  the  dynamic  environment  as  well  as  on  the  decision 
for  which  the  information  is  used,  and  on  the  properties  of  the  Informa¬ 
tion  Itself. 

Assuming  a  quadratic  payoff  function  the  value  of  a  given  piece 
of  information  was  found  as  a  function  of  its  age.  The  result  was  then 
put  in  a  form  which  shows  separately  the  effect  of  various  factors  on 
the  value  of  information.  We  observed  that  the  dynamics  of  the  state 
(environment)  are  the  main  determinants  of  the  dynamics  of  information 
and  manifest  themselves  through  changes  over  time  of  the  mean  of  the 
state,  given  the  stete  et  the  observation  time.  If  the  state  has  a 
normal  distribution,  this  is  the  same  as  the  changes  over  time  of  the 
correlation  coefficient  of  the  current  state  with  the  state  at  the  time 
of  the  observation. 

The  information  structure  influences  the  value  of  the  information 
through  the  variance  of  the  posterior  mean  of  the  state »  given  fresh 
information .  The  payoff  function  is  represented  by  a  single  matrix 
which  is  determined  from  the  coefficient  matrices  of  the  payoff  function. 
Although  the  information  structure  and  the  payoff  function  are  represented 
by  constant  matrices,  they  can  drastically  influence  the  dynamics  of  the 
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The  aquation  for  the  value  of  the  information  indicates  that  the 
outdating  of  inforaatlon  is  a  complicated  process  with  a  variety  of 
patterns.  Specific  results  were  obtained  for  linear  Markovian  systems. 

It  was  shown  thst  the  eigenvalues  of  the  state-space  matrix  of  the  system 
play  a  major  role  In  determining  the  pattern  of  the  Information  outdating 
process.  We  found  that  the  value  of  the  information  may  increase  or 
may  even  oscillate  with  time.  This  is  a  counter-intuitive  result, 
especially  for  a  Markovian  system.  It  was  shown,  however,  that  if  the 
information  is  perfect  its  value  always  decreases  with  time,  regardless 
of  the  dynamics  of  the  state  or  the  parameters  of  the  payoff  function. 
Other  cases  of  information  perishing  were  also  identified.  For  these 
cases,  bounds  for  the  rate  of  information  perishing  were  determined  by 
the  smallest  and  the  largest  eigenvalues  of  the  state-space  matrix  of 
the  system. 

The  information  outdating  process  for  autoregressive  systems  was 
also  investigated.  The  results  gave  insight  into  the  process  of  infor¬ 
mation  outdating,  in  particular  in  cases  where  the  value  of  the  informa¬ 
tion  is  enhanced  or  oscillates  with  time. 

In  the  second  part  of  this  research  we  have  investigated  the 
optimal  policies  for  recovery  (updating)  of  information  when  anticipating 
decisions  at  uncertain  times.  Both  the  a  priori  and  the  a  posteriori 
optimal  information  recovery  policies  were  studied.  The  case  of  a  one¬ 
time  contingent  decision  was  initially  analyzed.  The  results  were  then 
extended  to  the  case  of  repetitive  decisions  in  time. 


A  necessary  condition  for  an  optimal  information  recovery  policy 
was  found  for  the  general  case  with  the  following  interpretation:  at 
an  optimal  Information  recovery  time,  the  expected  marginal  loss  of  buying 
information  slightly  later  is  equal  to  its  expected  marginal  gain  in  the 
future.  For  a  one-time  decision  case  with  an  infinite  horizon,  the 
a  priori  optimality  condition  reduces  to  a  simple  form  and  states  that 
the  net  expected  payoff  of  the  decision  is  equal  to  the  expected  payoff, 
if  the  decision  were  to  occur  when  the  information  has  its  lowest  value 
(Immediately  before  each  recovery).  The  extra  payoff  of  the  decision, 
were  it  to  occur  at  other  points  in  time,  will  be  just  enough  to  pay  for 
the  cost  of  the  Information. 

We  have  shown  that  the  Information  recovery  problem  can  be  formu¬ 
lated  using,  as  our  basic  variable,  either  the  "residual  value  of  past 
information,"  or  the  "value  of  new  information"  at  each  point  of  time. 

The  two  variables  are  intimately  related  and  may  be  regarded  as  duals. 

It  was  found,  however,  that  the  latter  is  more  appropriate,  especially 
for  studying  the  a  posteriori  information  recovery  policies. 

In  addition  to  simplifying  the  optimality  condition,  the  new 
variable  permitted  an  important  observation  about  the  optimal  informa¬ 
tion  recovery  policies,  concerning  the  following  question:  Under  what 
conditions  is  the  optimal  information  recovery  time  independent  of  the 
result  of  the  previous  information  recoveries?  Under  such  conditions, 
the  a  posteriori  and  the  a  priori  information  recovery  policies  are 
identical  and,  therefore,  the  computation  of  the  optimal  policy  is 


greatly  simplified.  Using  the  new  variable  we  showed  that  if  the 
(total)  value  of  new  information,  given  that  it  is  bought  only  once, 
is  always  independent  of  the  result  of  the  previous  observations,  then 
each  optimal  recovery  time  is  Independent  of  the  result  of  the  previous 
observations.  This  result  is  general  and  requires  no  condition  on  the 
state,  the  decision,  or  the  information  structure.  One  important 
example  of  the  above  property  occurs  when  the  payoff  function  is 
quadratic  and  the  state  has  a  normal  distribution. 

In  the  repetitive-decision  case  there  are  two  distinct  differences 
from  the  one-time  decision  case:  (1)  each  piece  of  information  may  be 
used  for  more  than  one  decision  and,  therefore,  the  information  is 
potentially  more  valuable;  and  (2)  there  may  be  an  opportunity  to  learn 
about  the  state  from  each  decision  and  use  this  free  information  for 
future  decisions.  Several  cases  corresponding  to  various  types  of 
information  learned  from  the  decisions  were  studied.  We  found  that  for 
an  optimal  information  recovery  policy  the  residual  value  of  the  infor¬ 
mation  immediately  before  a  recovery  is  a  measure  of  the  expected  payoff 
of  each  decision  (as  it  was  in  the  case  of  a  one-time  decision) .  How¬ 
ever,  in  view  of  the  free  information  learned  from  each  decision  the 
payoff  contains  an  additional  component.  This  component  appears  in  the 
optimality  condition  as  a  separate  term  and  yields  the  exact  benefits 
of  learning  from  the  decisions. 

It  was  shown  that  in  the  repetitive-decision  case  we  always  buy 
more  information  than  in  the  one-time  decision  case  (because  Information 


may  be  used  for  more  than  one  decision  and  Is,  therefore,  more  valuable). 
However,  the  more  information  learned  frc  •  the  decisions,  the  less  infor¬ 
mation  we  will  buy.  When  perfect  Information  is  learned  from  each 
decision,  information  will  be  bought  at  the  same  rate  as  in  the  one-time 
decision  case. 

Finally,  the  effect  of  risk  aversion  on  the  optimal  information 
recovery  policies  was  studied  briefly.  The  optimality  condition  for  an 
expected-value  decision  maker  was  extended  to  the  case  of  a  risk-averse 
decision  maker.  It  was  found  very  difficult  to  make  any  comparison 
between  the  two  cases,  however,  because  the  payoff  lotteries  are 
different  for  the  two  decision  makers.  Nevertheless,  we  found  evidences 
which  suggest  that  a  risk-averse  decision  maker  would  buy  less  informa¬ 
tion  than  an  expected-value  decision  maker,  when  there  is  uncertainty 
regarding  the  time  of  the  decision.  Whether  (or  under  what  circumstances) 
this  is  in  fact  true,  remains  uncertain. 

7.2  Suggestions  for  Future  Research 

This  research  has  been  a  first  step  in  the  investigation  of  infor¬ 
mation  in  a  dynamic  framework.  Our  emphasis  has  been  on  the  formulation 
of  the  problem  and  the  development  of  simple  theoretical  models  which 
give  Insight  into  the  problem,  and  provide  a  basis  for  future  research. 
There  is  thus  considerable  potential  for  future  research  to  develop  more 
complete  models  based  on  real-world  examples,  and  to  establish  more  con¬ 
crete  criteria  for  rationalizing  the  information  production  process. 
Opportunities  for  future  research  exists  in  several  areas,  a  few  of  which 
are  suggested  below. 
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One  immediate  extension  of  the  work  in  this  study  will  be  to 
consider  a  changing  payoff  function.  We  have  assumed,  for  simplicity, 
that  the  payoff  function  does  not  vary  with  time.  This  may  not  be  the 
case  in  many  real-world  situations.  When  the  payoff  function  changes 
with  time,  its  effects  on  the  information  perishing  process,  and  also 
on  the  optimal  Information  recovery  policies  are  unknown. 

A  second  area  would  be  to  extend  the  results  to  a  nonstationary 
state  (environment)  which  is  important  since  this  is  typically  the  case 
in  practice.  There  is  no  fundamental  difficulty  in  extending  the  results 
to  nonstationary  systems,  although  this  will  increase  the  burden  of 
computation. 

An  important  extension  of  this  study  concerns  the  model  of  infor¬ 
mation  acquisition.  In  our  model  we  have  assumed  that  all  the  uncertain 
states  are  simultaneously  observed.  A  more  realistic  model  would  permit 
the  observation  of  various  uncertain  states  at  different  points  in  time, 
which  Implies  that  some  states  could  be  observed  more  frequently  than 
others . 

Another  extension  would  allow  the  underlying  model  of  the  dynamic 
environment  to  be  updated.  In  this  study,  the  model  of  the  environment 
is  considered  to  be  exogenous.  The  model  can  be  endogenously  determined 
from  previously  accumulated  information,  and  revised  as  new  information 
is  acquired.  In  other  words,  the  model  itself  is  also  updated.  This 
is  clearly  a  more  appropriate  model  for  an  applied  study. 
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APPENDIX 


NOTATION 

s(t)  one-dimensional  state  variable  at  time  t 

d  one-dimensional  decision  variable. 

£(t)  vector  of  state  variables  at  time  t 

£*  (t)  transpose  of  s^(t) 

cl  vector  of  decision  valuables  . 

v^(s(t),d)  payoff  of  the  decision  d^  (at  time  t) ,  given  s(t) 
prior  knowledge 

(£(t)|<?’}  joint  probability  distribution  over  s(t)  ,  given  & 

( £(t)  1^)  vector  of  expected  value  of  ^(t)  ,  given  &  . 

V(  £(t)  \S)  covariance  matrix  of  _s(t)  ,  given  &  . 

n(£(t))  information  structure  (observation)  on  s(t)  . 

z^t)  vector  of  observation  of  ^(t)  (z.(t)  =  ri(£(t))j 

£(t)  information  (data)  available  at  time  t 

(s_(t)  |y.(t) ,  S)  joint  probability  distribution  over  s_(t)  ,  given 
^(t)  and  $  . 

(  s(t)  lx(t)  »<§’)  vector  of  expected  value  of  ^(t)  ,  given  y(t)  and 

V<s(t)|  ^(t)  ,«f)  covariance  matrix  of  s(t)  ,  given  ^(t)  and  . 

vn(t-x)  (t)  expected  value  at  time  t  of  n(£(t-x)  . 

VD(t)  Vri^t_Tj(t)  for  a  stationary  state  variable. 

V* /„  v(t)  expected  value  at  time  t  of  z(t  ) 

zvt  }  o 

o 


V' 

z 


P(t) 
K  t) 

h 

—o 

R(T) 

Z 

ZfZ' 

ZgZ’ 

n  ^  n’ 

E 

c 

8. 


Vl(t’V 

vx(t) 

R(t) 

VW 


Vt<tl) 


v(t,zo) 


V^(Q^t)  for  a  stationary  state  . 

rate  of  information  perishing, 
posterior  mean  of  _s(t)  ,  given  ^(t)  . 
covariance  matrix  of  s(o)  . 

coefficient  of  linear  approximation  of  s(t)  as  a 
function  of  s(t-T)  . 

set  of  observations  £  . 

set  Z  of  observation  z^  is  "finer"  than  set  Z'  of 
observation  z/  . 

set  Z  of  observation  z^  is  "garbled"  into  set  Z' 
of  observation  z?  . 

ri  is  "more  informative"  than  f|'  • 

precipitating  event  in  a  contingent  decision . 

cost  of  each  observation. 

probability  of  the  decision  occurring  at  time  t 

result  of  the  previous  observations  at  time  t  . 

expected  payoff  of  the  decision  at  time  t  ,  given  Z 

a  priori  expected  payoff  of  the  decision  at  time  t 

residual  value  of  information  at  time  t 

maximum  expected  payoff  from  time  t  on,  with  the 
next  recovery  at  t^  ,  given  Zq  . 

a  priori  expected  payoff  from  time  t  on,  with  the 
next  recovery  at  t^  . 

maximum  expected  payoff  from  time  t  on,  with 
recovery  at  t  ,  given  Zq(«  Vt(t,Zo>)  . 


V(t) 

V  (T) 

vi(t.Zo) 

V'^t) 

* 

T 

t*(Z0) 

U(L) 

W1(t,Zo|T) 

W(t,ZQ) 

W1(T,Zo) 

V  (T) 
u 


a  priori  maximum  expected  payoff  from  time  t  on, 
with  recovery  at  t  (-  Vt(t))  . 

a  priori  net  expected  future  payoff,  with  information 
recovery  period  T  . 

expected  payoff  of  the  decision  at  time  t  ,  with 
recovery  at  t  ,  given  Zq  . 

a  priori  expected  payoff  of  the  decision  at  time  t  , 
with  recovery  at  t 

a  priori  optimum  Information  recovery  period. 

optimum  time  for  the  next  information  recovery,  given  the 

result  of  the  previous  observations,  Zq  . 

expected  utility  of  the  lottery  L  . 

value  of  new  information  (obtained  at  T  )  for  making 
the  decision  at  t  ,  given  Zq 

net  value  of  all  the  future  purchases  of  information 
with  the  first  one  at  t  ,  given  Zq  . 

total  net  value  of  Information  bought  at  T  ,  with  no 
more  information  in  future,  given  Zq  . 

uniform-payment  equivalent  of  V(T)  . 


6 


rate  of  discount  . 
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